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Abstract 



The Higgs boson is the only elementary particle predicted by the Standard Model (SM) 
that has not yet been observed experimentally. If it exists, it explains the spontaneous 
electroweak symmetry breaking and the origin of mass for gauge bosons and fermions. We 
test the validity of the SM by performing a search for the associated production of a Higgs 
boson and a W boson in the channel where the Higgs boson decays to a bottom-antibottom 
quark pair and the W boson decays to a charged lepton and a neutrino (the WH channel) . 
We study a dataset of proton-antiproton collisions at a centre-of-mass energy yfs = 1.96 TeV 
provided by the Tevatron accelerator, corresponding to an integrated luminosity of 5.7 fb" 1 , 
and recorded using the Collider Detector at Fermilab (CDF). We select events consistent with 
the signature of exactly one charged lepton (electron or muon), missing transverse energy 
due to the undetected neutrino (MET) and two collimated streams of particles (jets), at least 
one of which is required to be identified as originating from a bottom quark. We improve 
the discrimination of Higgs signal from backgrounds through the use of an artificial neural 
network. Using a Bayesian statistical inference approach, we set for each hypothetical Higgs 
boson mass in the range 100 - 150 GeV/c 2 with 5 GeV/c 2 increments a 95% credibility level 
(CL) upper limit on the ratio between the Higgs production cross section times branching 
fraction and the SM prediction. 

Our main original contributions are the addition of a novel charged lepton reconstruction 
algorithm with looser requirements (ISOTRK) with respect the electron or muon tight 
criteria (TIGHT), as well as the introduction of a novel trigger-combination method that 
allows to maximize the event yield while avoiding trigger correlations and that is used for 
the ISOTRK category. 

The ISOTRK candidate is a high-transverse-momentum good-quality track isolated from 
other activity in the tracking system and not required to match a calorimeter cluster, as 
for a tight electron candidate, or an energy deposit in the muon detector, as for a tight 
muon candidate. The ISOTRK category recovers real charged leptons that otherwise would 
be lost in the non-instrumented regions of the detector. This allows the reconstruction of 
more W boson candidates, which in turn increases the number of reconstructed WH signal 
candidate events, and therefore improves the sensitivity of the WH search. 

For the TIGHT charged lepton categories, we employ charged-lepton-dedicated triggers to 
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improve the rate of WH signal acceptance during data taking. Since there is no ISOTRK - 
dedicated trigger at CDF, for the ISOTRK charged lepton category we employ three MET- 
plus-jets-based triggers. For each trigger we first identify the jet selection where the trigger 
efficiency is flat with respect to jet information (transverse energy and direction of motion 
in the transverse plane for the two jets in the event) and then we parametrize the trigger 
efficiency as a function of trigger MET. On an event- by- event basis, for each trigger we 
compute a trigger efficiency as a function of trigger paramctrization, trigger MET, jet in- 
formation, trigger prescale and information about whether the trigger is defined or not. For 
the ISOTRK category we combine the three triggers using a novel method, which allows the 
combination of any number of triggers in order to maximize the event yield while avoiding 
trigger correlations. On an event- by- event basis, only the trigger with the largest efficiency 
is used. By avoiding a logical "OR" between triggers, the loss in the yield of events accepted 
by the trigger combination is compensated by a smaller and easier-to-compute corresponding 
systematic uncertainty. 

The addition of the ISOTRK charged lepton category to the TIGHT category produces an 
increase of 33% in the WH signal yield and a decrease of 15.5% to 19.0% in the median 
expected 95% CL cross-section upper limits across the entire studied Higgs mass interval. 
The improvement in analysis sensitivity is smaller than the improvement in signal yield 
because the ISOTRK category has a smaller signal over background ratio than the TIGHT 
category, due to the looser ISOTRK reconstruction criteria. The observed (median expected) 
95% CL SM Higgs upper limits on cross section times branching ratio vary between 2.39 
x SM (2.73 x SM) for a Higgs mass of 100 GcV/c 2 to 31.1 x SM (31.2 x SM) for a Higgs 
mass of 150 GeV/c 2 , while the value for a 115 GeV/c 2 Higgs boson is that of 5.08 x SM 
(3.79 x SM). 

The novel trigger combination method is already in use by several CDF analyses. It is 
applicable to any analysis that uses triggers based on MET and jets, such as supersymmetry 
searches at the ATLAS and CMS experiments at the Large Hadron Collider. In its most 
general form, the method can be used by any analysis that combines any number of different 
triggers. 
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Abrege 



Le boson dc Higgs est la seule particule elementairc prcditc par le Modelc Standard qui n'a 
jamais etc observee experimcntalement. S'il existe, il expliquc la brisurc spontance de la 
symetrie electrofaible, ainsi que la masse des bosons W et Z et de tous les fermions. On 
vcrific la validite du Modelc standard en cfTcctuant une recherche sur la production associee 
d'un boson de Higgs et d'un boson W dans le cas ou le boson de Higgs se desintegre en une 
paire de quarks bottom-antibottom et le boson W se desintegre en un lepton charge et un 
neutrino (le mode WH). Nos donnees furent accumulees en etudiant des collisions proton- 
antiproton a une energie au centre de masse de y/s =1.96 TeV produites par l'accelerateur 
Tevatron, a une luminosite integree de 5.7 fb _1 et collectees par le detecteur Collider Detec- 
tor at Fermilab (CDF). On selectionne des evenements avec une signature correspondantc 
a exactement un lepton charge (electron ou muon), de l'energie manquante transversale a 
cause du neutrino qui s'echappe du detecteur (MET) et deux jets de particules, dont au 
moins un doit provenir d'un quark bottom. On ameliore la discrimination entre le signal 
Higgs et lc bruit dc fond a l'aide d'un reseau de neurones artificiels. En utilisant une inference 
statistiquc bayesienne, on calcule pour chaque masse hypothctique du boson de Higgs dans 
Pintcrvalle 100-150 GeV/c 2 , avec des increments de 5 GeV/c 2 , une limite superieurc de 
95% d'intervalle de credibilite (CL), sur le rapport entre la section efheace multipliee par le 
rapport d'embranchement et celle predite par le Modele Standard. 

Notre contribution principale est l'introduction d'un nouvel algorithme de reconstruction 
d'un lepton charge avec des criteres plus laches (ISOTRK) par rapport aux criteres stricts 
de reconstruction des candidats d'electrons et de muons (TIGHT). La deuxicme contribution 
majeure consiste en l'introduction d'une nouvelle mcthodc pour combiner des declencheurs 
diffcrcnts permetant de maximiser le nombre d'evenements selectionnes et en meme temps 
que d'eviter les correlations entre les declencheurs. 

Un candidat de ISOTRK est une trajectoire qui correspond aux criteres de qualite, qui a une 
large quantite de mouvement transverse, qui est isolee d'autres activitcs dans le detecteur de 
trajectoires et qui nc doit pas se prolonger dans une zone active du calorimetre (detecteur de 
muons), comme pour un candidat d'electron (muon). Les candidats dc ISOTRK rccuperent 
de vrais leptons charges qui seraient autrement perdus dans les zones non instrumentees du 
detecteur. L'ajout de la categorie ISOTRK a la categorie TIGHT permet de reconstruire 
plusieurs bosons W reels et par la suite de recuperer plusieurs evenements WH, ce qui 
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amcliore la scnsitivite de la recherche de WH. 



Pour la categorie TIGHT des leptons charges, on utilise des declencheurs dedies aux leptons 
charges. Comme a CDF il n'y a pas de declencheurs dedies au leptons charges ISOTRK, 
on utilise pour la categorie ISOTRK trois declencheurs bases sur MET et jets. Pour chaquc 
declencheur, on identihe la selection des jets pour laquclle l'efhcacite du declencheur est con- 
stante par rapport a l'information des jets (l'energie transverse et la direction de deplacement 
dans lc plan transverse pour les deux jets de l'evenement). Ensuite, on parametrise l'cfncacite 
du declencheur en fonction du MET au niveau du declencheur. Pour chaquc evcncmcnt, on 
calcule l'emcacite du declencheur en fonction de la parametrisation du declencheur, le MET 
au niveau du declencheur, l'information des jets, le facteur de reduction des declencheurs 
et l'information si le declencheur est defini ou pas. Pour la categorie ISOTRK, on com- 
bine les trois declencheurs avec une nouvelle methode qui permet de maximiser le nombrc 
d'evenements accumules tout en evitant les correlations des declencheurs. Pour chaquc 
evenement, ce n'est que le declencheur avec la plus grande efficacite qui est utilise. Le 
nombre d'evenements ramasses est legerement inferieur a celui d'un "OR" logique entre les 
declencheurs, mais cela est compense par une erreur systematique qui est a la fois moms 
importante et plus facile a evaluer. 

L'utilisation de la categorie ISOTRK en plus de la categoric TIGHT augmcnte de 33% le 
nombre d'evenements WH selectionnes et decroit de 15.5% a 19.0% la limite supcricurc 
medianc attendue, exprimee a un niveau de crcdibilitc de 95%, calculee sur tout l'intcrvallc 
ctudie des masses du boson de Higgs. L'amclioration dans la sensitivite de l'analyse est 
moins importante que l'amelioration du nombre d'evenements collectes parce que la categorie 
ISOTRK a un plus faible rapport du signal sur le bruit de fond que la categorie TIGHT, a 
cause des criteres de selection plus laches pour la categorie ISOTRK. La limite superieure ob- 
servee (attendue) du rapport entre la section efficace multipliee par le rapport d'embranchement 
et cellc prcditc par lc Modclc Standard, exprimee en un niveau de credibilite dc 95%, varic 
entre 2.39 x SM (2.73 x SM) pour un boson dc Higgs dc 100 GcV/c 2 jusqu'a 31.1 x SM 
(31.2 x SM) pour un boson dc Higgs de 150 GeV/c 2 . En meme temps, pour un boson de 
Higgs de 115 GeV/c 2 , la valeur est 5.08 x SM (3.79 x SM). 

La nouvelle methode de combiner des declencheurs differents est deja utilisee par plusieurs 
analyses effectuees a CDF. Elle peut etre utilisee par toute analyse qui combine plusieurs 
declencheurs bases sur le MET et les jets, comme par exemple la recherche de la super- 
symetrie a l'aide des detecteurs ATLAS et CMS de l'accelerateur Large Hadron Collider. 
Dans sa forme la plus generale, la methode peut etre utilisee par toute analyse qui utilise 
un nombre variable dc declencheurs differents. 
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was able to win prizes to national competitions and olympiads at Physics and French, which 
won me a partial scholarship to do my university studies in France. I owe my health and 
the school successes to Stefan Brinzan. 
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Thesis Overview 



Chapter [T] presents the theory of elementary particles and their interactions (the Standard 
Model) in general and the spontaneous symmetry breaking through the Higgs mechanism in 
particular. Possible mechanisms to achieve the spontaneous symmetry breaking in theories 
beyond the Standard Model are also briefly discussed. 

Chapter [2] reviews the direct Standard Model Higgs boson searches at the LEP, Tevatron 
and LHC particle accelerators and the indirect constraints on the Higgs boson mass from fits 
of precision electroweak data to the Standard Model theory predictions. The direct Higgs 
boson search presented in this thesis, as well as its motivation, are also introduced. 

Chapter [3] presents the experimental infrastructure used for this analysis, especially the 
Fermilab particle accelerator complex, including the Tevatron accelerator, and the Collider 
Detector at Fermilab, especially its tracking, calorimeter and muon subdetector systems. 

Chapter [4] presents the high level object reconstruction, such as tracks, primary vertices, 
calorimeter clusters, charged lepton candidates (electron, muon, isolated track), missing 
transverse energy, jets. The algorithms used in this analysis that identify jets originating in 
bottom quarks are also described. 

Chapter [5] introduces the signal and background processes used by this analysis, describes 
how an event is simulated using a Monte Carlo generator and enumerates the generators 
used to simulate every relevant physics process. 

Chapter[B]details the online (trigger) and offline (analysis) event selection. The analysis uses 
electron or muon triggers for the tight charged lepton categories and the missing transverse 
energy plus jets for the isolated track category. The baseline event selection, the several 
6-tagging categories and the non-W (QCD) background veto are also described. 

Chapter [7] presents the computation of the predicted number of WH and ZH signal events, 
and enumerates various sources of systematic uncertainties for the event yield calculation. 

Chapter [8] presents the complex methodology used to compute the estimated event yield for 
each background process, as well as the tables for background, signal and data event yields. 



xix 



Chapter |H] describes the artificial neural network (ANN) used as a final discriminant in 
this analysis. After an introduction to ANNs, the variables used in the ANN training are 
enumerated, an overtraining check is presented and the ANN output shapes are overlaid for 
different signal and background processes. The second part of the chapter describes a second 
ANN used in this analysis to correct the jet energies to their true parton-level energies. 

Chapter [TU] presents the final result of this analysis, namely the ratio of the 95% credibility 
level upper limits on the Standard Model Higgs boson cross section times branching ratio 
to the SM predicted values. 

Chapter [TT] concludes this dissertation with a review of the analysis, its methodology, our 
original contributions and the results achieved. Future plans and possible improvements are 
also discussed. 

Appendix [A] describes in detail one of our major contributions to the analysis, namely the 
parameterization of each of the three missing-transverse-energy-plus-jets triggers and the 
measurement of the trigger prescale. 

Appendix [B] presents the details of another major contribution of ours to the analysis, 
namely the novel method to combine any number of triggers in order to maximize the event 
yield while avoiding the correlations between triggers. 

Appendix [C] describes the general structure, features and advantages of the data analysis 
software package used for this analysis, for which I was one of the three main developers. 

Appendix [D] enumerates the control plots relevant to proving that the analysis uses variables 
for which the background modelling in Monte Carlo simulated events agrees very well with 
the data measurements. 

The results presented in this dissertation have undergone multiple stages of intensive review 
within the Collider Detector at Fcrmilab collaboration. As such, the findings of this study 
may be disseminated publicly. They have already been presented in a talk at the New 
Perspectives conference on May 31 2011 and in a poster at the Fermilab Users Meeting on 
June 01 2011, both at Fermilab, USA. 
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Original Contributions 



This dissertation presents an experimental search for the Standard Model Higgs boson pro- 
duced in association with a W boson in proton-antiproton collisions at the Tevatron ac- 
celerator at Fermilab and recorded with the Collider Detector at Fermilab. The goal is to 
observe or exclude the Standard Model Higgs boson and thus confirm or refute the Higgs 
Mechanism. Research in experimental particle physics is a collaborative effort and this WH 
search makes no exception. This work could not have been possible without the contribution 
of the approximately 600 members of the Collider Detector at Fermilab (CDF) collabora- 
tion in general and without that of the approximately two dozen members of the WH CDF 
group in particular. In the sections below, I will emphasize my own original contributions 
to the WH search at CDF and present the sections in the dissertations that explain these 
in more detail, as well as the CDF published results that employ my work. Also, the im- 
ages employed in the thesis must be assumed to be my own contribution, unless mentioned 
differently in the caption by a reference or credit statement. 



A New Loose Charged Lepton Channel: "Isolated Track 



I improved the sensitivity of the WH search by making possible the introduction of a novel 
looser criterion to reconstruct charged lepton candidates. High-transverse-momentum good 
quality tracks isolated from other activity in the tracking chamber that are not required 
to match a calorimeter cluster or a muon chamber deposit are called "isolated tracks" or 
"ISOTRK" candidates. These loose charged lepton candidates form an orthogonal sample 
to the tight charged lepton candidates (which we call "TIGHT" candidates) and reconstruct 
also real charged leptons that arrive in the non-instrumented regions of the detector and 
would otherwise have been lost (Chapter [4] and Chapter [6]). Thus, the number of W boson 
candidate events is increased, which in turn increases the expected number of WH signal 
candidate events in our analysis (Chapter [7]) • Although the signal over background ratio 
is worse for the ISOTRK category than for the TIGHT lepton category (Chapter EJ, the 
expected upper limit on the cross section times branching ratio for the rare WH signal 
is improved for all Higgs boson mass points when both categories are used in the limit 
calculation (Chapter [TU]). The final result of this dissertation is the Higgs boson upper limit 
with TIGHT and ISOTRK alone and with TIGHT and ISOTRK combined. 
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The WH analysis presented in this thesis is not the first one at CDF. I continuously improved 
the trigger parametrization and therefore the signal yield for the ISOTRK category, applying 
it to the WH searches that used 2.7 fb -1 , 4.3 fb _1 and 5.7 fb _1 of integrated luminosity 
in different years. The WH search presented in this dissertation introduces for the first 
time a novel method to combine different triggers. This method is subsequently applied to 
two previously used triggers and one new trigger to arrive at an updated set of results in a 
sample of 5.7 fb _1 of integrated luminosity. 

Three "MET + Jets" Trigger Parameterization 

Since the CDF detector does not have a dedicated- "isolated track" trigger, we use triggers 
based on the event information orthogonal to the charged lepton information, i.e. missing 
transverse energy (MET) and jets. We have three such MET-based triggers at CDF. In 
order for these to be usable for analyses, one needs to parametrize for each trigger the 
efficiency turnon curves for each trigger level, compute the trigger prescale and identify the 
jet selection needed so that the trigger efficiency is flat with respect to jet information. These 
measurements are done in data samples. This complex parametrization is used to compute 
for each Monte-Carlo-simulated event a trigger efficiency using event information. The 
simulated events are weighted by the trigger efficiency. One of my important contributions 
to the WH search was determining the MET-based trigger parametrization in new datasets 
and continuously improving the parametrization methodology. 

My parametrization of a first MET-based trigger was used for the ISOTRK charged lepton 
channel of the WH analysis with 2.7 fb _1 that is described in detail in the PhD dissertation 
of Jason Slaunwhite from the summer of 2008 [T]. A Physical Review D paper draft based 
on this analysis is currently under the second and last review of the CDF collaboration. 
Our analysis is a WH analysis using an artificial neural network. There is also another WH 
analysis using matrix element computations as inputs to a multivariate technique involving 
a boosted decision tree approach. The two analyses using the same event selection and 
an integrated luminosity of 2.7 fb _1 were combined in the summer of 2008. The combi- 
nation increased the WH search sensitivity by 10% over the best of the two analyses. In 
this combination, both analyses used the ISOTRK charged lepton channel and the trigger 
parametrization I had measured. The result was presented at the summer conferences of 
2008 and published in a Physical Review Letters paper [2J. 

I improved the methodology for trigger parametrization and two MET-based triggers were 
used for the ISOTRK charged lepton category for the WH analysis with 4.3 fb _1 that is 
described in detail in the PhD dissertation of Yoshikazu Nagai from the summer of 2009 [3j . 
The combination of the two triggers avoided correlations between triggers since the events 
were divided in two orthogonal kinematic regions based on jet information. Each trigger 
was used for all the events in a given kinematic region and ignored for the events in the 
other kinematic region. The result was one of the inputs for the limit calculation for the 
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CDF combination of August 2009 presented in Figure |2~91 and for the Tevatron combination 
of November 2009 presented in the bottom part of Figure 12.101 The results were presented 
at the summer conferences of 2009. 

I used the same methodology to parametrize the trigger efficiency turnon curves using data 
collected with a total integrated luminosity of 5.7 fb _1 . These were used to update the 
WH analysis, whose result was again one of the inputs for the limit calculation for the CDF 
combination of July 2010 presented in Figure [2TS1 and for the Tevatron combination of July 
2010 presented in the top part of Figure |2".101 The results were presented at the summer 
conferences of 2010. A Physical Review D paper draft based on this analysis is currently 
under development. 

I later parametrized a third additional MET-based trigger that did not exist from the be- 
ginning of Run II of the Tevatron, but was introduced after about 2.7 fb _1 had already been 
collected, thanks to the effort of the CDF Higgs Trigger Task Force. For each of the three 
MET-based triggers, I measured the prescale and the necessary jet variable selection so that 
the parametrization is done as a function of missing energy only. A detailed description of 
the MET-based trigger parametrization is given in Appendix [A] 

For three triggers, it is more complex to divide the jet kinematic phase space in orthogonal 
regions and study which trigger is on average more efficient for each region. It is in this 
context that I introduced a novel method to combine any number of triggers in order to 
maximize the event yield and yet not have an "OR" between the triggers in order to avoid 
trigger correlations and thus systematic uncertainty estimation difficulties. 

Novel Method to Combine Triggers 

The novel method I introduced generalizes the previous method by considering each indi- 
vidual event as an infinitesimally small kinematic region. Just as before, only one trigger 
is assigned to all the events in this kinematic region. A study is performed to check which 
trigger is more efficient in this region. However, since the region is formed of only one event, 
the study consists simply of comparing the trigger efficiencies of the three triggers for this 
event and assigning to the event only the trigger with the largest efficiency. For data events, 
it is checked if the event has fired the chosen trigger. If it does, then the event is kept. 
If it does not, the event is rejected, without checking if the event has fired other triggers, 
which makes sure correlations between triggers are avoided. For a Monte-Carlo simulated 
event, the trigger is always assumed to fire and the event is assigned a weight equal to the 
efficiency of the chosen trigger. 

Although the event yield is slightly smaller than in the case when we take a simple "OR" 
between triggers, a reduction in the systematic uncertainty of the event yield is expected 
due to avoiding correlations between the triggers. It also becomes easier to evaluate the 
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systematic uncertainty. To achieve this, I also parametrized the trigger efficiency turnon 
curves in several bins of several variables. A detailed description of the novel method to 
combine triggers is given in Appendix [Bl 

I also developed a software package called ABCDF, which is easily portable in any CDF 
analysis (and in fact in any high-energy-physics data analysis framework). ABCDF will take 
as input all the relevant information from a given event and return the trigger efficiency 
(event weight due to the trigger parametrization). The method works with an unlimited 
number of triggers, so other triggers can in principle be added in order to increase the signal 
acceptance. 

The analysis presented in this thesis uses the same integrated luminosity as of the sum- 
mer of 2010, but introduces for the first time the third MET-based trigger and the novel 
methodology to combine triggers, which improves the signal acceptance and the analysis 
sensitivity even more. As soon as this PhD dissertation is submitted, I will continue to 
work on the WH analysis as a postdoctoral URA Visiting Scholar Fellow in order to make 
use of the latest available integrated luminosity and add further improvements to be part of 
the Summer 2011 CDF and Tevatron combinations and to be shown at the Summer 2011 
conferences. 

Main Author of a New Data Analysis Framework 

As part of the subgroup that searches for the Higgs boson produced in association with a W 
boson and decaying into a bottom quark pair (WH group) , I am one of the main authors that 
developed a new data analysis software framework, called WHAM, that performs several 
analyses that use the signature of exactly one charged lepton plus missing transverse energy 
plus a number of tight jets, such as WH, WZ and Technicolor searches and single-top 
measurements. WHAM allows all these analyses to share common tools in order to produce 
a larger number of scientific results with less manpower. WHAM will play a crucial role 
from now until the end of the CDF analysis effort, when the CDF collaboration manpower 
decreases. Since I integrated my ABCDF software package fully into WHAM, most of these 
ongoing analyses are benefiting from the ISOTRK channel or from other loose muon channels 
that make use of the three MET-based triggers combined with the novel method. For these 
reasons, I will be a main author for the subsequent publications of these analyses in the 
WH group at CDF. WHAM is described in detail in Appendix ICl 

The work on WHAM has delayed my thesis submission by about a year. Yet, it was time 
well spent and a huge investment in the analysis power in the lepton plus jet signature at 
CDF, as well as in my coding and data analysis skills. 
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Isolated Track" Scale Factor Measurement 



As for all the charged lepton categories, also ISOTRK candidates have a different recon- 
struction efficiency in Monte Carlo simulated events and in data events. A postdoctoral 
researcher (Nils Krumnack) built a code to measure these efficiencies for each jet multiplic- 
ity and compute their ratio, which is also known as the "scale factor" between data and 
Monte Carlo fSubsubsection |4.4.4|) . For the past two years I have maintained this code and 
updated the scale factor value and its control plots for new data periods being processed for 
the collaboration. 
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Chapter 1 



The Standard Model and the 
Higgs Mechanism 

1.1 Elementary Particle Physics 

Humans have always asked themselves how the world works. They looked around them and 
observed a large diversity of plants, animals and minerals. They wanted to know what all 
these are made of and how the constituent elements stayed together. In other words, they 
wanted to know what are the fundamental ingredients of matter and what is the recipe used 
by Nature to produce out of the fundamental ingredients all the diversity of things, both 
alive and inert, that exist. 

Elementary particle physictQ is a domain of physics that uses the scientific method to describe 
the fundamental building blocks of matter and the elementary interactions between them. 
The current elementary particle physics theory is called the Standard Model (SM). Before 
we examine the Standard Model in more detail, let us have a short incursion into the history 
of the human attempts to answer these fundamental questions about the Universe. Let us 
examine briefly the advance of science that lead to the advent of elementary particle physics. 

1.1.1 Short History of Quest for Ingredients of Matter 

At first, humans attributed supernatural causes to everyday phenomena. Their explanations 
touched on superstition and religion. 

However, about 2,500 years ago, people living in Ancient Greece and impassioned about un- 
derstanding the Universe, who called themselves philosophers, started to search for natural 
causes for natural phenomena. Therefore, they created science and created the first schools 
where science was studied and promoted. Two schools of thought in Ancient Greece pro- 

1 Elementary particle physics is also called particle physics, subatomic physics or high energy physics. 
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posed two different answers to what the fundamental ingredients of the Universe were. One 
proposed that everything is made of various combinations of four fundamental elements: 
earth, water, air and firc|§. Another one proposed that everything is made up of many 
types of indivisible spheres called atomJ§. Both were closer to philosophy than to scientific 
theories and both had no idea of what the underlying laws of the Universe could be. 

The next big step happened around 500 years ago in Western Europe when ancient science 
evolved into modern science. Modern science used both experiments and mathematical 
formalism to advance knowledge of natural phenomena. Modern science made the jump from 
simple observation of the world to performing reproducible experiments. The advancement 
of mathematics allowed for a mathematical formulation of scientific ideas. Scientific theories 
used mathematics to make numerical predictions about observable quantities in nature. 
Scientific experiments measured these quantities. If there was an agreement, the theory was 
potentially confirmed. Otherwise, the theory was contradicted. It is experiment that is the 
supreme judge of truth in the scientific metfiocg 

Modern science was indeed successful. FirsJl, it showed that the fundamental four elements 
of Ancient Greece were not fundamental after all, but rather made of other elements. Tens of 
such elements were discovered and studied by a scientific field called chemistry. Therefore, 
fundamental elements of the Universe were thought to be these chemical elements, but 
no fundamental interaction was known between these. Nexjfl, chemists showed that each 
chemical element is made of a certain type of atom, an idea also originating in Ancient 
Greece, but whose experimental demonstration came only in the 19 th century. 

Then, another field of science called physics showed that even atoms are not elementary, but 
rather they have a structure. Inside atoms there are elementary particles called electrons 
that have a negative electric charge. It is the electric interaction between electrons that 
keeps atoms together in molecules and molecules together in inorganic matter or cells of 
living organisms. Therefore, for the first time, a recipe of the Universe was proposed: the 
electric force. Physics pushed more and discovered inside an atom also a nucleus positively 
charged. The same electric force now explained the structure and stability of atoms. 

Latei0 it was shown that the electric force and the magnetic force are two different aspects 
of one fundamental force: the electromagnetic force. Nexl[f[ physics showed that even nuclei 
have a structure, as they are made of protons and neutrons. A new sub domain of physics 

2 This line of thought was first mentioned in the work of the Greek philosopher Empedocles around 450 
BC. 

3 This school of thought was introduced by the Ancient Greece philosophers Leucippus, Democritus and 
Epicurus. 

4 However, it is possible for the outcomes of scientific experiments to agree with theories that are incorrect. 
5 The generation of chemists lead by the French chemist Antoinc-Laurent de Lavoisier (1743 - 1794) 
founded modern chemistry by discovering the chemical elements. 

6 Thc British chemist John Dalton (1766 - 1844) discovered the atoms. 

7 The British theoretical physicist James Clerk Maxwell (1831 - 1879) built in 1865 the theory of electro- 
magnetism that described the electric force, the magnetic force and light. 

s Thc New Zealander experimental physicist Ernest Rutherford (1871 - 1937) discovered the nucleus in 
1910 and the proton in 1919. His student, British experimental physicist James Chadwick (1891 - 1974), 
discovered the neutron in 1932. 
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was formed: nuclear physics. In order to explain how protons and neutrons are kept together 
inside a nucleus, a new interaction was introduced, namely the strong nuclear interactions. 
In order to explain why certain types of atoms are unstable and decay into other type 
of atoms plus some radiation, a third interaction was introduced, namely the weak nuclear 
interaction. It was the second time that fundamental recipes of the Universe were introduced. 
At that stage the fundamental ingredients of the Universe were the proton, the neutron and 
the electron and the fundamental interactions of the Universe the electromagnetic force, the 
gravitational force, the strong nuclear force and the weak nuclear force. 

But physicists did not stop there. They showed that not even protons and neutrons are 
elementary particles. Instead, they are each formed by three main quarks and a multitude of 
other quarks and antiquarks, all kept together by the exchange of new elementary particles, 
gluons. A new sub domain of physics is thus formed: particle physics. This is the domain 
of science discussed in this thesis. 

1.1.2 Current Paradigm in Particle Physics 

The current paradigm in particle physics is expressed in the theory of the Standard Model of 
particle physics. The SM describes that matter is formed by six types of quarks and six types 
of leptons. These elementary particles interact with each other through exchanges of other 
elementary particles called gauge bosons. There are four fundamental interactions in nature, 
but the gravitational force is not described by the SM because of the very small strength of 
the gravitational force at the energies currently available in our particle accelerators. The 
SM describes the strong force mediated by gluons, the weak force mediated by the Z°, W + 
and W~ bosons and the electromagnetic force mediated by the photon (7). 

However, cosmological experimental data in the last decade has shown that the ordinary 
matter described by the SM represents in fact only about 4% of the matter-energy content 
of the Universe, while a currently unknown type of matter forms about 22% (dark matter) 
and an unknown type of energy forms about 74% (dark energy). 

1.1.3 Accelerator and Cosmic-Ray Particle Physics 

Particle physics also has two sub domains: accelerator-based particle physics and cosmic-ray 
particle physics. In accelerator-based particle physics, subatomic particles such as protons, 
antiprotons, electrons and/or positrons are produced and accelerated to very high energies, 
up to velocities very close to the speed of light in vacuum. These accelerated beams of 
particles arc collided cither head-on with other beams of particles (collider particle physics) 
or with a fixed target material (fixed-target particle physics). In these collisions, the large 
kinetic energy of incoming particles is transformed into mass of new elementary particles, 
which decay into other elementary particles that are recorded by large particle detectors 
that surround the collision region. 
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Cosmic-ray particle physics studies collisions of very energetic cosmic subatomic particles 
(neutrons, light nuclei) with protons and neutrons from the atoms from Earth's atmosphere. 
These collisions produce a shower of particles that propagate in the atmosphere and are 
detected by ground-based particle detectors. The advantage of cosmic-ray collisions is that 
they are sometimes a lot more energetic than the current particle accelerators on Earth can 
offer. The disadvantage is that the experimental conditions are not reproducible and the 
data rates are low. 

Some of the first subatomic particles were discovered in cosmic-ray collisions: the positron, 
the muon, the 7r meson. Then accelerator-based particle physics started to dominate the 
advancement of particle physics. In this thesis we will be discussing collider particle physics. 

1.1.4 Elementary Particles and Interactions 

Elementary particlejfl come in two types: fermions and bosons [4]. Fermions have semi- 
integer spins and obey the statistics of Fermi-Dirac. Bosons have integer spins and obey the 
statistics of Bose-Einstein. Matter elementary particles are fermions and they also come in 
two categories: leptons and quarks. Leptons and quarks interact with each other through 
the exchange of force carrier particles, which are bosons. 

There are six types of leptons grouped in three weak isospin doublets. Each pair contains 
a negatively charged lepton and a neutrally charged lepton generically called neutrino. The 
first doublet is formed by an electron (e - ) and an electron neutrino {v e )- The second doublet 
is formed by a muon (/x - ) and a muon neutrino (v^). The third doublet is formed by a tau 
lepton (t _ ) and a tau neutrino (v r ). Leptons interact only through the electromagnetic and 
the weak forces. 

There are also six types of quarks grouped in three pairs as well. Unlike the leptons, quarks 
have fractional electric charge. However, the electric charge difference between the members 
of the pair is also a unit of one. Each pair contains a quark with electric charge equal to 
+||e| and another quark with electric charge equal to — ||e|, where e is the electric charge of 
the electron. The first pair is formed by the quarks up (u) and down (d) . The second pair is 
formed by the quarks charm (c) and strange (s). The third pair is formed by the quarks top 
(t) and bottom (b). Quarks interact through all the three elementary interactions, including 
the strong force which is specific only to them. Each quark possesses a quantum number 
called "color" that can have the values of red, green or blue 10 !. 

The three families of leptons and quarks respectively are also called generations. On aver- 

9 Elementary particles are also called fundamental particles. 

10 These names have nothing to do with the colours from everyday life. They originate in an analogy with 
light from everyday life, where the colours red, green and blue produce the colour white when combined. 
The colour white is considered neutral. If for the electric force there is a need of two different charges to 
produce a neutral object, the strong force needs three. This is why they bear the names of the three colours 
that need to be combined to produce a neutral colour (white). 
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age, the particles of the second generations are more massive than the ones from the first 
generation and less massive than the ones in the third generation. It is currently believed 
that when the Universe was still relatively small and hot, all these elementary particles 
were produced. However, as the Universe expanded and cooled, the particles from the third 
generation decayed into particles of inferior generations and as the Universe expanded and 
cooled even more, the particles of the second generation decayed into particles of the first 
generation. Therefore, at current energies in the Universe, all matter is made up of particles 
from the first generation: the up and down quarks form protons and neutrons, electrons are 
part of the atoms and electronic neutrinos are emitted in nuclear radioactive decay. 

The electromagnetic interaction is mediated by a photon (7). The electroweak interaction 
is mediated by the Z°, W + and W~ bosons. The strong interaction is mediated by eight 
type of gluons. Each gluon has a combination of a color and an anticolor. 

In Table 11.11 we present a summary of the elementary particles of the Standard Model and 
their properties. 



Fermions 


1 st Gen. 


2 nd Gen. 


3 rd Gen. 


Intcraction(s) 


Q 


Spin 


Leptons 
Quarks 


electron (e - ) 
e- neutrino (y e ) 
up (it) 
down (d) 


muon (fJ.~) 
fi- neutrino (17,) 
charm (c) 
strange (s) 


tau (t - ) 
t- neutrino (17) 
top (t) 
bottom (b) 


EM, Weak 
Weak 
EM, Weak, Strong 
EM, Weak, Strong 


-1 


+2/3 
-1/3 


1/2 
1/2 
1/2 
1/2 




Name 


Force 


Coupling 


Mass (GcV/c 2 ) 


Q 


Spin 


Gauge 
Bosons 


photon (7) 
W boson 
Z boson 
gluon (g) 


EM 
Weak 
Weak 
Strong 


10- 2 
10- 13 
10- 13 
1 



80.4 
91.2 






±1 




1 
1 
1 
1 



Table 1.1: Table of Standard Model elementary particles and their properties. Q means the 
electric charge. 



1.1.5 Antiparticles 

For almost every elementary particle there is an antiparticle with the same mass, spin, 
lifetime and decay width, but with opposite electric charge and other quantum numbers. 
The particles that are their own antiparticles are the photon and the Z° boson. Experiments 
have not yet shown if the neutrinos are the same particle as the antineutrinos, or if they are 
different particles, as the Standard Model assumes. 

Unless otherwise explicitly mentioned, throughout this thesis all statements referring to 
particles are also valid for their corresponding antiparticles. 
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1.2 Particle Physics Theories 
1.2.1 Global Gauge Theories 



The Standard Model is a local quantum field theory with a local gauge symmetry. The 
equations of motion are deduced using a least action principle. 

Historically speaking, a theory with a global gauge symmetry was first proposed by Paul 
Adricn Maurice Dirac who described the equations of motion of an electron with electric 
charge and spin that did not interact with other particles (free particles). The Dirac free 
Lagrangian describes in fact all free fermions fields: 

L{x) = $(x)(i'fd ll - m)yj{x) . (1.1) 

In equation 11.11 tp represents a Dirac field of mass m and 7 M are Dirac's matrices. This 
equation is invariant under a global U(l) transformation described by 

ip(x) e iQa ij(x) , (1.2) 

where Q is the electric charge, a; is a space-time 4- vector and a is a parameter that does not 
vary with spatial coordinates. There is a theorem called Noether's theorem [5] that states 
that for every symmetry there is a conserved quantity. The U(l) symmetry results in a 
conserved 4-vector current j M = —Q^^tp: 

d,f = 0. (1.3) 

It means that the charge Q is conserved in time, i.e. the integral over the space coordinates 
for j^. 

1.2.2 Local Gauge Theories 

As real elementary particles interact with one another, interactions need to be introduced 
in the free Lagrangian from equation ll.il This is obtained by using a parameter a that is 
space dependent. The U(l) transformation becomes a local U(l) transformation. 

i/>(x) -> e iQa ^tp{x). (1.4) 

Adding simply this local gauge transformation in the free Lagrangian does not keep the 
Lagrangian invariant. In order to do so, the partial derivative is also transformed into a 
covariant derivative that is defined by the following equation: 
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(1.5) 



where A M is a new 4- vector field that is transformed like this: 

A^ — > A^ — fyafc) . (1.6) 

It results that 

D^{x) ^^(i) . (1.7) 

By replacing all these in the Dirac Lagrangian formula in equation II .11 we obtain the La- 
grangian for the interacting fcrmions. This theory is called Quantum ElectroDynamics 
(QED): 

Lqed = ^(x)(*V*D^ - m)^(s) - i-V" , (1.8) 
where F^ v = d^A^ — d^A^ is the covariant kinetic term of A^. 

The QED theory describes the electromagnetic interaction. If we apply the Euler-Lagrange 
least action principle[5] on ll.8l then we obtain the equation of motion for a field ip undergoing 
electromagnetic interaction through the exchange of a massless vector field A^ : 

(iy*0„ - m)iP(x) = QY l A^(x) . (1.9) 

Indeed, this Lagrangian does not have any mass term of the type \m} A^A^ , as this would 
break the Lorcntz invariancc. Therefore the vector field A^ is massless. This is in complete 
agreement with the fact that the gauge boson mediator for the electromagnetic interaction, 
the photon, is massless. This is due to the fact that the electromagnetic force's range of 
action is infinite. 

1.2.3 Standard Model Theory 

Local gauge theories are also used to describe the other fundamental interactions of ele- 
mentary particles. The weak interaction has already been unified theoretically with the 
electromagnetic interaction described above. The new fundamental interaction called elec- 
troweak is described by the Standard Model. 

The groups SU(2)l X U(V)l describe the electroweak interaction which is spontaneously 
broken into the weak interaction described by the V-A theory and the electromagnetic 
interaction described by Quantum Electrodynamics (QED). SU(2) is a non-Abelian group 
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of spin algebra (weak isospin group) and it has three gauge vector fields. 



A simplified model with two half-integer-spin massless fermions / and /' can explain the 
electroweak interaction such as the Qf = Qf + 1, where Q is the electric charge. The V-A 
currents explain the weak interactions of leptons using a left-handed doublet field and two 
right-handed singlet fields. 

fa = I J , i>2 = f R (x) ^3 = f' R {x) , (1.10) 



with: 



hA*) = \i l± 7fe)/(»). hA*) = ^/»(! ± 7s) , (1.11) 

f L Ax) = 1(1 ± 75)/'^), /' Lii? (x) = 1/'(.t)(1 ± 75) . (1.12) 



It is helpful to define T3 as the third component of the weak isospin from the SU (2) group 
and Y the hypercharge from the £7(1) group. Then the electric charge Q, T3 and Y arc 
connected by the Gell-Mann-Nishijima equation: 

Q = T 3 + |. (1.13) 



We can see that the left-handed doublet with T3 = ±1/2 and Y = 1 describes a charged 
lepton / and a neutrino /'. However, the right-handed singlet with T3 = 0, Y = —2 
contains only the charged lepton. This means that there is no right-handed component of 
the neutrino. 

In order to express the Lagrangian for the full electroweak interaction with three generations 
of massless lepton pairs, we start with the free field Lagrangian: 

3 

L(x) = Y,ij>j(xh tl d fi *l> j (x) . (1.14) 

Introducing now in the free Lagrangian a SU{2) ® £7(1) gauge transformation given by 

i/)j(x) -> $(x) = e l ^ s ^ x)+lY ^ {x H {x) , (1.15) 

and a covariant derivative given by 

Di = d„- ig T - ■ W^x) - ig'YjB^x) , (1.16) 

we obtain the following interaction Lagrangian: 
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3 

L I {x)=Y J ^A*)Did^{x) . (1.17) 
i=i 

In equation 1 1 . 161 there are three vector bosons (W M ) from the SU(2) generators, one (B^) 
from the U(l) generator, and four coupling constants {g and g'Yj, where j = 1, 2, 3). 

After algebraic calculations, the electroweak interaction Lagrangian l 1 . 1 71 can be decomposed 
in a charge current term (Lcc) an d a neutral current term (ijvc) : 

Lj(x) = L CC (x) + L NC {x) , (1.18) 
The charge current term contains only terms from the left doublet field 

Lcc(x) = ^={f(x)>f(l - y 5 )f'(x)-^=W+(x) + Hermitian conjugate} . (1.19) 

We recognize in this expression that W+(x) is a linear combination of W^(x) and W^(x). 
This Lagrangian describes the fermion interactions mediated by the W + and W~ bosons. 

Since the neutral current term of the electroweak interaction Lagrangian contains a linear 
combination of the neutral vector fields B^ (x) and W„ (x) , it can be also written as the sum 
of two terms: 

L NC (x) = Li c {x) + L z NC {x) . (1.20) 
The first term can be written as 

3 

Li c {x) = Y / ^(x)^[g^-sme w +g , Y J cos6 w ]ij J (x)A^x) , (1.21) 
and the second term can be written as 

3 

Lnc( x ) = I^j(z)7 M [2y cos6 w +^Y j an9 w \Mx)M x ) • ( L22 ) 
j'=i 

In the equations 11.211 and 11.221 9w is the Weinberg angle. We then write the following four 
equations: 

gsinOw — e, (1.23) 

g'cose w Y 1 =e{Q f -l/2) , (1.24) 
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g' cos 9 W Y 2 = eQf , 



(1.25) 



g'cos6 w Y 3 =eQ f > . (1.26) 

These equations represent the fundamentals of the Standard Model, to which we have to 
add the equations that explain the gluon exchange interactions between the quarks. The 
equations above explain the fermion interaction, as the coupling constants appear in these 
equations. However, in these equations all fcrmions and bosons are massless. We know from 
experiment that is not the case. In section [L3l we will discuss how a spontaneous symmetry 
breaking and the Higgs mechanism generate the mass for leptons and gauge bosons without 
breaking the gauge invariance of the electroweak interaction Lagrangian. 

1.3 Spontaneous Symmetry Breaking 

Systems with infinite degrees of freedom and a Lagrangian invariant under a group G of 
transformations can have non symmetric states through spontaneous symmetry breaking, 
or a Higgs mechanism. 

Starting with the electrodynamics Lagrangian we can imagine a toy Higgs mechanism [B] [7] [5] : 

L(x) = -\f^(x)F^{x) + (1.27) 
+ ieA„(x))^(x)] ■ [(d» - ieA»(x))<f>(x)] 
-^(x)4>(x)-h[^(x)<(,(x)] 3 , 

where <j>(x) is the scalar field of electromagnetic interaction via the Abelian gauge field A^{x) 
and where h > 0, /i 2 < 0. It can be checked easily that equation 11.271 is invariant for the 
following gauge transformations: 

4>{x) -> <P'(x) = e ia( - x U(x), Ap(x) -> A'^x) = A^x) - - d^a{x) . (1.28) 

The equations of motion produced using this Lagrangian have solutions that correspond to 
the minimal energy, thus to the vacuum expectation values in the lowest order perturbation 
theory. Since /i 2 < 0, besides the trivial solution <fi(x) = there exists a set of degenerate 
solutions with |0 2 | = = 4j- due to the underlying gauge symmetry <j>(x) = ^e ia ^(see 
Fig. ll.ip . We are allowed to choose such that (f>'(x) is real. It implies that the lowest energy 
state is <f>{x) = -j=. 
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ft 2 > 



/V 2 < 



A 

■ 2 



Figure 1.1: Symmetry breaking depending on /i 2 parameter: /i 2 > on the left, /j 2 < on 
the right [9|. 

Then at first order we can write 

= i= [A + <M*)], <h(x)=0, A'^x) = B^x) . (1.29) 

Now we can substitute equation 11.291 into equation 11.271 and arrange the Lagrangian in 
powers of 4>i{x): 

L(x) = -^B fw (x)B^(x) + ^e 2 X 2 B fl (x)B^(x) (1.30) 
+e 2 \B^(x)B^ 1 {x) + ^e 2 \B^x)B^ 2 (x) 
+ ^[d^ 1 (x)d^ 1 {x)+2^ 2 cfi{x)\ 

+^(*) + ^(*)-JaV. 

We can now find physical meaning in each of these lines. The first line describes a massive 
vector field with mass | e A | (the previously massless gauge boson acquired mass). The second 
line describes the interaction term between the massive vector field and the neutral scalar 
field with coupling strengths e 2 A and \e 2 . The third line corresponds to a free scalar 
particle called the Higgs particle with a mass Mh = \J — 2/i 2 - The fourth line describes the 
self interaction of the scalar field. 

The effect of the spontaneous symmetry breaking is that the initial four degrees of freedom 
in equation ll.271 (two in the initial complex scalar field and two in the massless vector field) 
transformed into four other degrees of freedom (a real neutral scalar particle and a massive 
gauge boson). 



11 



1.4 The Higgs Mechanism and the Higgs Boson 



In a similar way, the Higgs mechanism can be applied to equation ll.lOl to allow the W and 
Z bosons to acquire mass. Two complex scalar fields introduced to break spontaneously 
the symmetry of the gauge groups SU(2) ® U(l) form an isodoublet with respect to the 
SU{2) group: 

«" s ( 7$> ) ■ 

where the charged (neutral) component of the doublet is <f) + {x) (<fp{x)), respectively. This 
creates a Higgs field potential where h > and fi 2 < 0: 

V H (x) = -^{x)<f>(x) - h[^{x)ct>{x)} 2 . (1.32) 

From equation 11.291 we know that the neutral scalar field <f> (x) has a vacuum expectation 
value of -j=. Therefore, at first order we can rewrite eauation ll.31l to 

In this equation there is an explicit SU(2) gauge freedom, as three of the four components 
of the field 4>(x) are gone and only one real scalar field (j>°(x) remains, with 4>°{x) — (A + 
X{x))- 

By putting together all the previous equations we obtain the SM part of the Lagrangian 
that produces the mass of the W ± and Z° bosons: 



L(x) = - A g 2 \ 2 Wl{x)W^{x)+ l -{g 2 +g' 2 )\ 2 Z tl (x)Z^ (1.34) 
+\g 2 \Wl(x)W»(x) x (x) + \g 2 WlW» x 2 {x) 
+ - A (g 2 +g' 2 )\Z,(x)Z^x)x(x) + \g 2 'Z,(x)Z^x) X 2 (x) 
+ \[d^x(x)d^ X (x)+2fi 2 x 2 (x)} 



2 ..2 



Equation II .341 tells us that the W ± bosons have acquired mass: 



m w + = m w - = ^Xg , (1.35) 
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and that also the Z° bosons have acquired mass: 



1 Xg 



2 2 cos 0„ 



(1.36) 



We can see that some Standard Model parameters are now constrained by theory, such as: 

raw 



m z 



cos &w 



> m w , 



V2 



8m w 



(1.37) 
(1.38) 



However, the mass of the Higgs boson (m x — y— 2/?, sometimes written as ran as well) is 
not constrained by theory and has to be measured by experiments once the Higgs boson is 
observed experimentally. 

If a Yukawa coupling is added such as 



+c f 



{j{x)J\x)) L ^ 
{f{x)J'{x)) L 



<f>+{x) \ 

4>°(x) I 



f' R {x) (1.39) 
fn(x) + Hermitian conjugate , 



where / represents a fermion and /' represents the corresponding antifcrmion, f(x) repre- 
sents a fermion field and f'(x) an antifcrmion field, the Higgs Mechanism will produce a 
term to the SM Lagrangian that will give mass both to fermions and antifcrmions, both to 
leptons and quarks: 



A 



(1.40) 



Here the constant c/ is also not constrained by theory and is deduced from the measured 
fermion masses. The mass of a fermion is equal with the mass of the corresponding an- 
tifcrmion. 



1.5 Physics Beyond the Standard Model 



The Standard Model of elementary particles and their interactions is the most precise physics 
theory ever developed. No experiment has shown convincing contradiction with the SM. 
However, physicists believe that the SM is a low-energy approximation of a higher-energy 
theory [10) |11) . much the same way that classical physics is a particular case of quantum 
mechanics. There are various aspects that are not explained yet by the Standard Model: 
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the disappearance of antimatter shortly after Big Bang, the nature of dark matter, the mass 
of neutrinos, why there are three generations of elementary particles, etc. In addition, in 
order to have a Higgs boson mass at the electroweak scale, there has to be some mechanism 
that cancels the radiative corrections of the Higgs spectrum. Elementary particle physi- 
cists perform experiments at the Tevatron, the LHC and other particle physics laboratories 
around the world to test for the validity of the Standard Model, with a hope that new 
phenomena will be observed. A new theory that would describe correctly these supposedly 
new phenomena will be called a theory of physics beyond the Standard Model (BSM). 

1.5.1 Supersymmetry 

A popular candidate theory BSM is the theory of supersymmetry (SUSY). It predicts that 
every SM elementary particle has a corresponding partner that has not yet been observed 
experimentally. This theory introduces a symmetry between fermions and bosons, predicting 
that every SM fermion has a bosonic partner and every SM boson has a fermionic partner. 
This allows for the cancellations of Higgs radiative corrections that would otherwise require 
a very precise fine tuning that physicists find hard to accept. The supersymmetric partners 
of SM elementary particles are called "supcrpartners" . For example, the supcrpartner of 
the top quark, the gluon, the W ± and Z° bosons are called stop (i), gluino (g), gauginos 
(X ) and gaugino (x°), respectively. 

If supersymmetry exists, it is a broken symmetry. If it were unbroken, the superpartners 
would have exactly the same mass as their SM partners and they would have been already 
observed experimentally. Therefore, if the superpartners exist, they must be more massive 
than their SM counterparts. 

The minimal extension to the Standard Model as a supersymmetry is called the Minimal 
Supersymmetric Standard Model (MSSM). In MSSM there would not be just one Higgs 
boson, as predicted by the Standard Model, but 5 new particles that play the role of the 
Higgs boson, three neutral and two electrically charged: h, H, A, H + and H~ [T^]. The 
lightest of the neutral Higgs particles (h) has very similar properties to those of the Standard 
Model Higgs boson. This is why if a Standard Model Higgs boson is discovered at the 
Tevatron or the LHC, precise measurements of its properties would be necessary to check 
if it is really a Standard Model Higgs boson or a SUSY one. All SUSY models predict the 
existence of at least two Higgs bosons. 

Although LHC experiments will not have the Tevatron's sensitivity for low mass Standard 
Model Higgs search within approximately 1-2 years, the LHC experiments will be able 
to improve upon the Tevatron's results for MSSM Higgs boson searches with 1 fb _1 of 
integrated luminosity at y/s = 7 TeV that is collected right until the projected major 
shutdown to prepare the LHC for y/s = 14 TeV collisions. 
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1.5.2 Dynamic Electroweak Symmetry Breaking 

The Standard Model and its supersymmetric extension explains the spontaneous symmetry 
breaking by the introduction of a Higgs mechanism and the prediction of one scalar (spinlcss) 
elementary particle recognized as a Higgs boson. However, they do not explain why there 
should be in nature a scalar field with a non-zero vacuum expectation value. For this 
reason, a new theory called "Technicolor" providing a dynamic reason for the electroweak 
symmetry breaking was created in 1979 by Weinberg [13] and Susskind [14]. Back then, 
the most massive known fermion was the bottom quark, with a mass of approximately 5 
GeV/c 2 [4]. The electroweak theory had been developed and it had predicted the existence 
of the W (Z) gauge boson with a mass of approximately 80 (90) GeV/c 2 , although they had 
not yet been observed experimentally. Since the largest fermion mass was negligible with 
respect to the gauge boson masses, the Technicolor theory explained the mass of the gauge 
bosons and not the mass of the fermions. The Technicolor theory is very similar to the 
QCD theory and introduces the dynamic spontaneous symmetry breaking in a similar way 
as the spontaneous chiral symmetry breaking in QCD. Therefore, the Technicolor theory 
introduces a new strong interaction, a new gauge group and predicts the existence of new 
elementary particles called techniquarks. 

Since the Technicolor theory could not predict the mass of Standard Model fermions, it was 
an incomplete theory. The theory was extended and models called Extended Technicolor 
were introduced in order to explain also the fermion masses. However, precise experimental 
measurements revealed that the predictions of the theory were refuted for quarks as massive 
as the charm quark. The theory was therefore further refined and Walking Technicolor 
and Multi-Scale Technicolor emerged. Some combinations between Supcrsymmetry and 
Technicolor theories also appeared. 

When in 1995 the top quark was discovered and was found to have the unexpected large 
mass of approximately 175 GeV/c 2 [3], which is also the value of the weak scale v wea k = 
1/v2\/2Gf = 175 GeV, where Gf is the Fermi constant, it was suggested that the top 
quark could play a special role in the spontaneous symmetry breaking in theories beyond 
the Standard Model. A theory called Topcolor was created, but since it predicted a top 
quark mass of 220 GeV/c 2 , the theory was immediately refuted. However, when Topcolor 
and Technicolor theories are combined, a new theory called Topcolor Assisted Technicolor 
was created, where the top quark is the first of the predicted techniquarks. This theory 
explains the mass of all gauge bosons and fermions, including the heavy top quark. The 
predictions of this theory can be checked at the Tevatron or the LHC. 

In conclusion, there are a variety of dynamic spontaneous symmetry breaking theories that 
have evolved with time. A well written summary of such theories can be read in Reference 
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1.6 Summary 



This chapter started by introducing elementary particle physics, the sub-domain of physics 
that studies the elementary particles, the smallest building blocks of matter, and the ele- 
mentary interactions or forces that allow the elementary particles to form bound states and 
thus create the structure of matter we see around us. We presented the two experimental 
methods to study elementary particles: accelerator particle physics and cosmic-ray parti- 
cle physics. We continued by presenting the advancement of particle physics theories from 
global gauge theories to local gauge theories. We then presented the current theory of parti- 
cle physics, a local gauge theory called the Standard Model of elementary particles and their 
interactions. The Standard Model has been confirmed by all the numerous experiments in 
elementary particle physics ever conducted. However, the theory by itself does not allow 
elementary particles to acquire mass. We know experimentally that elementary particles 
have masses, though, and if they had none, the Universe would be very different than it is 
today and we would not exist. 

In 1964 a new theory was proposed by the theorists Higgs, Englert, Brout, Guralnik, Hagen 
and Kibble, which was later called the Higgs mechanism. The theory explained in an 
elegant way the spontaneous symmetry breaking of the electro-weak interaction into the 
electromagnetic interaction and the weak interaction, which is equivalent to splitting an 
elementary particle into the photon of zero mass and the Z° boson of non zero mass, which 
is equivalent to introducing a mechanism to allow the Z° boson to acquire mass. The Higgs 
mechanism predicts the existence of a scalar field that pervades the entire Universe, the 
Higgs field. Each elementary particle couples to the Higgs field with a strength proportional 
to its mass. The Higgs mechanism is a falsifiablc theory, since it predicts the existence of 
a new elementary particle, the Higgs boson, which is described uniquely by its mass. This 
thesis will present an experimental search for the existence of the Standard Model Higgs 
boson and therefore an experimental test of the Higgs Mechanism and the Standard Model 
of particle physics. We concluded the chapter by discussing several spontaneous symmetry 
breaking explanations in theories beyond the Standard Model, such as the Supersymmetry 
or Technicolor theories. 

In the next chapter we will present the summary of the direct and indirect searches for the 
Standard Model Higgs boson that have been performed until now at particle accelerators 
around the world, namely the LEP, the Tevatron and the LHC accelerators. We will also 
introduce and motivate the Higgs boson direct search presented in this dissertation. 
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Chapter 2 

Standard Model Higgs Boson 
Experimental Searches 



The Standard Model Higgs boson has not yet been observed experimentally. However, 
experimental limits on the Standard Model Higgs boson mass have been set both by direct 
and indirect searches. In this chapter we will introduce the direct searches at the LEP 
and Tevatron accelerators, as well as the prospects for the LHC accelerator. We will also 
present the indirect electroweak fits for the Higgs boson. Finally we will present our WH 
search at CDF at the Tevatron and present our motivation for choosing this search. In this 
dissertation, by a Higgs boson we mean a Standard Model Higgs boson. 

2.1 Direct Searches at the LEP Accelerator 

The Large Electron Positron Collider (LEP) collided electrons and positrons between 1989 
and 2000 at centre-of-mass energies yfs between 189 and 209 GeV [16]. A total of 2461 
pb _1 of integrated luminosity of collision data was analyzed by each of the four detector 
collaborations at LEP: OPAL, ALEPH, L3 and DELPHI. They looked for Higgs boson 
production in association with a Z boson, where the Higgs boson decayed to a pair of bottom- 
antibottom quarks and the Z boson decayed leptonically (e + e~ — > Z°H, with Z Q — > 
or Z — > vv and H —> bb), or where the Higgs boson decayed to a r lepton pair and the Z 
boson decayed to a generic quark pair (e + e _ — > Z°H, with Z° — > qq and H — > rr). 

The results from each experiment and channel were combined by the LEP Electroweak 
Working Group and a Standard Model Higgs boson was excluded at 95% confidence level 



(CL]^| (4) for a mass less than 114.4 GeV/c 2 , as we can see in Figure |2~T1 

1 All CDF and Tevatron results presented in this thesis use a Bayesian statistical treatment in setting 
confidence levels denoted with CL. All other results presented from other experiments use a frequentist 
approach with confidence level also denoted with CL. 
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Figure 2.1: The ratio CL S = CL s+ t,/CLb between the confidence levels for the signal plus 
background hypothesis and the background only hypothesis as a function of the Higgs boson 
mass |16| . The dashed line represents the median expectation. The green and yellow shaded 
bands around the median expected curve correspond to the 1 a and 2 a probability bands. 
The observed result is represented by the solid line. 



2.2 Electroweak Indirect Fits 



Although the Higgs boson mass cannot be predicted by theory, it can be constrained through 
fits on electroweak parameters that are measured by experiment. Radiative corrections due 
to the Higgs boson loops to the mass of the W and Z bosons or the top quark, as shown 
in Figure 12.21 depend on the Higgs boson mass. Inversely, precise measurements of these 
masses put an indirect constraint on the Higgs boson mass. 

The contribution of Higgs boson mass to the mass of gauge bosons can be given by taking 
into account Feynman diagrams of radiative corrections, such as those in Figure l2"T2l and is 
given by the following formula: 



M 2 
lvl w 



Mf(l-sin 2 6 W ) 



1 + Ap, 



3Gi 
8W2 



M? + 



y/2G F , r2 r ll, , Ml, -i 



(2.1) 
(2.2) 
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Figure 2.2: Radiative loop contribution to masses of electroweak objects [T7]. Precision 
measurements of the gauge bosons and of the top quark mass can provide a limit on the SM 
Higgs boson mass. 



where Gp is the Fermi coupling constant, 6w is the Weinberg angle, M t , Myp, Mz and Mh 
are, respectively, the masses of top quark, W and Z bosons and Higgs boson. 

Figure 12.31 [18] shows predictions of Higgs boson mass as a function of the W boson mass 
Mjy and the top quark mass M t or vice versa. The dashed (solid) line represents indirect 
constraints on M t and Mw from LEP-I and SLD experiments (LEP-II and Tevatron-II) 
experiments. Both contours are plotted at 68% CL. The green contour represents the allowed 
phase space for M t and Mw as a function of the Higgs boson mass. We can see that the two 
contours agree and they suggest a low mass Higgs boson, just beyond the direct lower mass 
limit set by the LEP experiments. The arrow labelled as Aq shows the global variation if 
a(Mz) is changed by one standard deviation. This variation gives an additional uncertainty 
to the SM band shown in the figure. 

Figure |2~41 conveys the same message in a different manner. The plot presents the quality 
of the Standard Model fit (A% 2 ) as a function of the Higgs boson mass. The Higgs mass 
preferred by the fit is the one that minimizes Ax 2 . The latest fit is produced by the 
LEP Electroweak Working Group [19] using m t = 173.1 ± 1.3GcV/c 2 [20] and m w = 
80.420 ± 0.031 GcV/c 2 [3T]: 

m H = 87±H GeV/c 2 (2.3) 

and the 95% confidence level upper limit is 

m H = 158 GeV/c 2 [19]. (2.4) 
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Figure 2.3: SM relationship between mt, mw and ran PS]. The dashed (solid) line represents 
indirect constraints on m t and mw from LEP-I and SLD experiments (LEP-II and Tevatron- 
II) experiments. The white band represents the Higgs boson mass interval excluded at 95% 
CL by the Tevatron accelerator in July 2010. Both contours are plotted at 68% CL. The 
green contour represents the allowed phase space for m t and mw as a function of the Higgs 
boson mass. We can see that the two contours agree and they suggest a low mass Higgs 
boson, just beyond the direct lower mass limit set by the LEP experiments. The arrow 
labelled as Aa shows the global variation if a(mz) is changed by one standard deviation. 
This variation gives an additional uncertainty to the SM band shown in the figure. 



We see that the direct preferred Higgs boson mass is excluded by the direct searches at LEP 
experiments. If this exclusion is taken into account and a new fit is performed, then the 
95% confidence level values increases up to 185 GeV/c 2 [19"] . 



2.3 Direct Searches at the Tevatron 



The Tevatron accelerator at the Fermi National Accelerator Laboratory (FNAL) in Batavia, 
Illinois (in the suburbs of Chicago), USA, collides protons and antiprotons at a centre-of- 
mass energy of 1.96 TeV. For almost two decades, the collisions at the Tevatron were both 
the most energetic and with the highest instantaneous luminosity in the world. However, 
in March 2010 the Large Hadron Collider (LHC) at Le Centre Europeen de Recherche 
Nucleaire (CERN) in Geneva, Switzerland, has broken Tevatron's record for centre-of-mass 
energy. As of March 2010, the LHC collides protons and protons at a centre-of-mass energy 
of 7 TeV, with a predicted 14 TeV energy in about five years. As of May 2011, the LHC has 
demonstrated instantaneous pp luminosities in excess of 10 33 cm s , more than double the 
highest pp luminosity achieved by the Tevatron. In this dissertation we present a Standard 
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Figure 2.4: The plot presents the quality of fit to electroweak precision data versus Higgs 
boson mass Q15] ■ The Higgs boson mass range excluded through direct searches at LEP and 
Tevatron experiments is shown in yellow. The solid dark blue is the nominal fit and the light 
blue bands represent the theoretical uncertainties on the fit. The 68% confidence level band 
is at Ax 2 = 1 and the 95% confidence level is at Ax 2 = 2.7. The dashed and dotted curves 
represent fit results with slightly different input parameters, such as different theoretical 
calculations of the vacuum polarization (Aa[ 5 l) and values for the W boson mass obtained 
with low Q 2 experiments. 

Model Higgs boson search at the Tevatron. 

2.3.1 Higgs Boson Production at the Tevatron 

There are various processes predicted by the Standard Model through which a Higgs boson 
can be produced at the Tevatron accelerator. The cross sections of these processes vary with 
the centre-of-mass energy of collisions. 

At the Tevatron, the relative cross sections can be seen in Figure 12.51 The most likely is 
gluon-gluon fusion. Next in line and about ten times less likely is an associated production 
of a W boson and a Higgs boson. Next comes the associated production between a Z 
boson and a Higgs boson and is about twice less likely than the previous process. These 
are followed by vector-boson fusion and the associated production between a top-quark pair 
and Higgs bosons, which have almost negligible cross sections. 



21 



SM Higgs production 



a [ft] 



; 


gg -> H 


TeV II _ 

\ 


- qq — > WH 

t 







; qq -» qqH 






>-\bb->H 










qq -> ZH - 


gg.qq -> ttH 






TeV4LHC Higgs working group 







100 120 140 160 180 200 

m H [GeV] 

Figure 2.5: SM Higgs production cross sections for pp collisions at a centre-of-mass energy 
of 1.96 TeV [22]. 



2.3.2 Higgs Boson Decay 



The Standard Model predicts the Higgs boson decay modes and their branching ratios 
independent of the way the Higgs boson was produced^! and depend only on the Higgs 
boson mass, as we can see in Figure 12751 

Higgs boson masses up to 114.4 GeV/c 2 have been excluded at 95% CL by direct searches at 
experiments at the LEP accelerator. Also, all Higgs boson masses from 186 GeV/c 2 upward 
have been excluded also at 95% CL by indirect electroweak fits. The remaining interval 
of possible Higgs boson masses is divided into values below and above 135 GeV/c 2 . If in 
the interval 114.4 GcV/c 2 -135 GeV/c 2 (also called the "low mass range" ) , the Higgs boson 
decays predominantly to a pair of bottom-antibottom quarks (H — > bb). On the other hand, 
if in the interval 135 GeV/c 2 -186 GeV/c 2 (also called the "high mass range"), the Higgs 
boson decays predominantly to a pair of W bosons (H — > W* W j§. 

A W boson can decay leptonically to a charged lepton and its corresponding neutrino (W — > 
ev ei or W — > [iv^ or W — > tv t ) or hadronically to a quark-antiquark pair (W — > qq). 

2 Be it at the LHC in pp collisions at ^fs = 7TeV or at the Tevatron in pp at y/s = 1.96 TeV, the Higgs 
boson decays in the same way, depending only on its mass. However, the production cross section increases 
with the centre-of-mass energy. 

3 Since both W bosons cannot be on shell for a Higgs boson mass below approximately 160 GeV/c 2 , one 
of the W bosons will be a virtual particle, hence the notation with "*" . 
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Figure 2.6: Branching ratios for the main decays of the SM Higgs boson [TU] . 
2.3.3 Low Mass Direct Searches at the Tevatron 

At the Tevatron, the quark pair production through QCD processes has a cross section of 
about ten orders of magnitude larger than the Higgs boson processes. Quarks hadronize 
very quickly and they are observed in particle detectors as collimatcd streams of subatomic 
particles called jets. The key therefore is that at least one W or Z boson originating from 
a Higgs boson event is observed in the detector through its leptonic decay products. 

Since an event with a Higgs boson produced through gluon fusion that subsequently decays 
to a bottom-quark pair (gg — > H — > bb) does not present any charged lepton, it is practically 
impossible to observe a low mass Higgs boson produced through gluon fusion due to the 
higher backgrounds. 

However, the process next in line in terms of cross section (WH associated production) can 
be observed when the W boson is reconstructed through its leptonic decay: qq — > W* — > 
WH — > lv{ bb. The charged lepton (an electron or a muon) is reconstructed in the detector 
and the neutrino is seen as missing transverse energy. Therefore, the compromise is that a 
ten times smaller signal cross section process brings the advantage of a high QCD background 
rejection due to the presence in the Higgs boson event of a W boson that is reconstructed 
through its leptonic decay. 

A similar channel is the ZH associated production. The Z boson is also reconstructed 
through leptonic decay, either to a pair of charged leptons (Z — > l + l~) or to a pair of 
neutrinos (Z — > vv). The sensitivity of this channel is similar but smaller than that of the 
WH channel. 
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It is also possible to search for a Higgs boson produced in gluon fusion that decays to a pair 
of tau leptons {gg -> H — >• t+t~). A tau lepton decays either leptonically or hadronically. 
By asking for one tau lepton to decay leptonically and another one to decay hadronically 
this channel is also somewhat sensitive at the Tevatron, although less than the associated 
production modes. 

2.3.4 High Mass Direct Searches at the Tevatron 

If in the high mass range, a Higgs boson decays to a pair of W bosons. Reconstructing both 
W bosons reduces significantly the QCD and electroweak backgrounds. This search is the 
most sensitive at the Tevatron in the high mass range. 

The most recent high mass direct search from CDF was performed using an integrated 
luminosity of 7.2 fb _1 and is presented in the top part of Figure |2~T1 When combined to the 
high mass analyses at DZero using up to an integrated luminosity of 8.2 fb _1 , a Tevatron 
high mass Higgs boson combination result is achieved, which is presented at the bottom 
part of Figure [2~7l These results were made public on 7 March 2011 and were presented at 
the conferences of Winter 2011 [25] . 

2.3.5 This Analysis: SM WH Search at CDF II 

Our motivation for choosing a Standard Model Higgs boson search for this PhD thesis, 
although a SUSY Higgs search would have been possible as welfl is that we feel that 
priority should be given to the effort of discovering or excluding a Standard Model Higgs 
boson. The Tevatron accelerator was chosen because it was the only particle accelerator in 
the world to have been and still be sensitive to Standard Model Higgs boson on the timescale 
of the thesis. Given that electroweak indirect fits favour a light Higgs over a more massive 
one, this thesis performs a low mass Higgs boson search. The most sensitive low mass Higgs 
boson search at the Tevatron is the associated production WH, where the W bosons decay 
leptonically and the Higgs bosons decay to a pair of bottom quarks. The leading-order 
Feynman diagram of the WH — > Ivbb process is presented in Figure RTT1 This is the analysis 
presented in this dissertation. 

2.3.6 Combination of all CDF SM Higgs Analyses 

The WH search is one of the most sensitive low mass Standard Model Higgs boson modes 
at CDF, but not the only one. CDF combined in July 2010 all Higgs searches, both at 
low and high mass, to obtain an improved CDF Standard Model upper limit on the cross 

4 There are other analyses at CDF and also at the DZero, ATLAS and CMS detectors searching for a 
SUSY Higgs boson. 
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section |24j . The individual analyses expected_| and observed upper limits are presented in 
the top part of Figure 12.81 My work contributed directly to the WH search using exactly 
two tight jets (in red) and therefore contributed also to the combined CDF result, which is 
also presented in detail in the bottom part of Figure 12.81 

The CDF Higgs combination from July 2010 improves and supersedes the same result from 
August 2009 [25] , to which my work also contributed directly as part of the WH search (in 
red). The CDF Higgs combination from August 2009 is presented in Figure [231 

2.3.7 Combination of all Tevatron SM Higgs Analyses 

In order to achieve a higher sensitivity for the Standard Model Higgs boson search, all the 
low and high mass search channels, both from CDF and DZcro, are combined to create one 
Tevatron cross section times branching ratio plot. The latest Tevatron combination dates 
from July 2010 and can be seen in the top part of Figure 12.101 It excludes at 95% CL a 
SM Higgs boson with a mass in the range 158-175 GeV/c 2 [55]. In the low mass region, 
the limit is about 2 times the SM prediction. This result improves upon and supersedes 
the previous Higgs Tevatron combination result from November 2009, which had excluded 
the interval 163GcV/c 2 to 166GeV/c 2 [27] and which can be seen in the bottom part of 
Figure |2". 101 My work contributed directly to both of these combined Higgs searches. 

2.4 Expected Higgs Boson Production at the LHC 

The only other particle accelerator in the world that can search for a Standard Model Higgs 
boson is the Large Hadron Collider (LHC) at the CERN laboratory in Europe. At the 
moment, the LHC is colliding protons and protons at centre-of-mass energy of y/s = 7 TeV. 
The current run is scheduled to end in 2012, with a "technical stop" at the end of 2011. At 
least lfb^ 1 of integrated luminosity is expected to be collected during this run by each LHC 
experiment. This run will be followed by a shutdown lasting about two years when the LHC 
will be upgraded to be able to run at the design centre-of-mass energy of y/s = 14 TeV. 

Various processes have different cross sections as a function of the colliding particles and 
the centre-of-mass energy. Due to similar behaviour for background processes and different 
behaviour for signal processes, as seen in Figure |2~. Ill the most sensitive channels for a Higgs 
boson discovery at the LHC arc different than those at the Tevatron. A compilation of the 
most up-to-date cross section values for SM Higgs boson can be found in Reference [32] . 

The ATLAS [28] and CMS [29] experiments have made public their first searches for the 
Standard Model Higgs boson for the winter conferences of 2011. However, none of those 
results were as sensitive as the Tevatron combination result from the summer of 2010. 

5 Expected limits measure the sensitivity of the analysis and are computed as the median of a distribution 
of many pseudo-experiments using pseudo-data, as explained in Section 110.51 
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2.5 Summary 



In this chapter we have presented the history of direct searches for the Standard Model 
Higgs boson at the LEP, Tevatron and LHC accelerators, as well as the constraints on 
the Higgs boson mass from fits between precise experimental electroweak measurements 
and the Standard Model prediction. Since the Standard Model Higgs boson has not yet 
been observed experimentally and since it is uniquely described by its mass, experimental 
particle physicists have refuted at 95% CL the existence of the Higgs boson if its mass is 
in a certain mass range. For example, in 2000 the combination of several direct searches 
from four collaborations at the LEP accelerator at CERN have excluded at 95% CL Higgs 
bosons with masses smaller than 114.5 GeV/c 2 . In the summer of 2010, the combination of 
several direct searches from the CDF and DZero collaborations at the Tevatron accelerator 
at Fermilab have excluded at 95% CL Higgs boson masses in the range of 158-175 GeV/c 2 . 
There were also indirect fits from measurements of the masses of the W boson and top quark 
to the Standard Model predictions that exclude at 95% CL Higgs boson masses with values 
larger than 185 GeV/c 2 . 

At the winter conferences of 2011 the first searches for the Standard Model Higgs boson 
from the ATLAS and CMS collaborations at the LHC accelerator at CERN have been 
made public, but they are not yet more sensitive than the Tevatron results. However, the 
situation is expected to change in 2012, when the LHC experiments will take the lead from 
the Tevatron once more and in a few extra years will observe or exclude the Standard Model 
Higgs boson over its entire available mass range. 

This dissertation presents a direct search for the Standard Model Higgs boson by the Collider 
Detector at Fermilab collaboration at the Tevatron accelerator at Fermilab. The allowed 
Higgs mass interval may be divided into a low mass region and a high mass region, depending 
on the most probable decay channel. Since indirect electroweak fits suggest a Higgs mass 
in the low mass region, we chose to perform the most sensitive search at the Tevatron for a 
low mass Higgs, which is the associated production between a W boson and a Higgs boson. 
The leptonic decay of the W bosons allows us to reconstruct in the detector the electron or 
muon candidate and thus increase considerably the signal over background ratio and thus 
the sensitivity of the analysis. 

In the next chapter we will present the experimental infrastructure used to perform the WH 
search. 
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Figure 2.7: Expected and observed upper limits normalized to the Standard Model (x SM) 
for a high mass Higgs boson search for CDF (top) and CDF and DZero (bottom), where at 
most 8.2 fb -1 is used, as of March 2011 [23]. The limits are presented as a function of the 
Higgs boson mass, between 110 GeV/c 2 and 200 GcV/c 2 . The horizontal line at 1 represents 
the Standard Model prediction. The observed upper limits are represented by solid black 
lines. The expected median upper limits are represented by the dashed black lines. The 
green (yellow) band represents the 1 (2) standard deviation interval around the expected 
median upper limit represented by dashed lines. 
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Figure 2.8: Expected and observed upper limits normalized to the Standard Model (x SM) 
for a Higgs boson search for CDF individual analyses using between 5.6 and 5.9 fb -1 and 
their combined result, as a function of the Higgs boson mass, between 100GeV/c 2 and 
200GeV/c 2 , as of July 2010 [24]. The horizontal line at 1 represents the Standard Model 
prediction. The yellow (top) or pink (bottom) band is the region excluded through a direct 
search combining searches at all the experiments at LEP accelerator. The observed upper 
limits are represented by solid lines. The expected upper limits are represented by the 
dashed lines. Besides the individual analyses used in this combination, the combined upper 
limit is represented in dark red in the top plot. In the bottom plot, the green (yellow) band 
represents the 1 (2) standard deviation interval around the expected upper limit represented 
in dashed lines. 
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Figure 2.9: Expected and observed upper limits normalized to the Standard Model (x SM) 
for a Higgs boson search for CDF individual analyses using between 2.0 and 4.8 fb -1 and 
their combined result, as a function of the Higgs boson mass, between 100GeV/c 2 and 
200 GeV/c 2 , as of August 2009 (25]. The horizontal line at 1 represents the Standard Model 
prediction. The yellow (top) or pink (bottom) band is the region excluded through a direct 
search combining searches at all the experiments at LEP accelerator. The observed upper 
limits are represented by solid lines. The expected upper limits are represented by the 
dashed lines. Besides the individual analyses used in this combination, the combined upper 
limit is represented in dark red in the top plot. In the bottom plot, the green (yellow) band 
represents the 1 (2) standard deviation interval around the expected upper limit represented 
in dashed lines. 
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Figure 2.10: Expected and observed upper limits normalized to the Standard Model predic- 
tion for a Higgs boson search in a combination of searches both at CDF and DZero using 
between 5.6 and 5.9 fb" 1 , as of July 2010 [26] (top) and Nov 2009 [27]. The horizontal 
line at 1 is the Standard Model prediction. The dashed line represents the expected limit. 
The green (yellow) band represents the one (two) standard deviation interval around the 
expected limit. The solid line is the measured limit. The 95% CL Standard Model exclusion 
intervals are represented by the Higgs boson masses where the measured line goes below 
the Standard Model line. In July 2010 a SM Higgs boson was excluded at 95% confidence 
level in the interval 158-175 GeV/c 2 , which supersedes the interval of 163-166 GeV/c 2 of 
November 2009. 
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Chapter 3 

Experimental Infrastructure 



The Higgs boson search presented in this thesis is performed using the Collider Detector at 
Fermilab (CDF) that surrounds one of the two collision points at the Tevatron accelerator 
hosted at Fermilab, Batavia, IL, USA. In this chapter we will present the experimental 
infrastructure of our analysis, namely the accelerator complex at Fermilab and the CDF 
detector. 



3.1 The Fermi National Accelerator Laboratory 



The Fermi National Accelerator Laboratory (also known as Fermilab or FNAL) was founded 
by Robert W. Wilson (1914-2000) to be the largest particle physics laboratory in the XJSM 
As of January 2011, Fermilab remains the only national particle physics laboratory in the 
US, as other former particle physics laboratories have switched to related fields of sciencj§. 

Fermilab hosts a multi-stage complex accelerator complex, of which the Tevatron is just 
the most energetic accelerator. The Tevatron performs proton-antiproton collisions that are 
recorded by two general purpose detectors, CDF and DZcro. These collisions are used for 
fundamental particle physics research. Fermilab also hosts a scries of fixed target experi- 
ments, as beams of protons are extracted from the Tevatron and collided with fixed targets. 
In this process secondary beams of mesons, muons or neutrinos are produced. 

1 Bcsides the scientific aspect, Fermilab offers many other things to the US in general and its local 
community in particular. Fermilab owns a very large surface of land where the prairie was recovered as it 
used to be a couple of hundred years ago. In addition, a bison herd is raised in a large farm at Fermilab. 
Fermilab also hosts numerous lakes where a number of migratory birds take refuge, especially the Canada 
geese. The public is allowed to walk or bike in the natural environment at Fermilab. Fermilab is also a strong 
supporter of science communication to the general public. Its second director, Leon Lederman, created a 
science outreach centre at Fermilab. In addition, there are Saturday morning physics lectures and other 
public lectures from the latest developments on science and science policy. I had the chance to spend many 
months at Fermilab and I took great pleasure of enjoying all these various opportunities that Fermilab has 
to offer. 

2 For example, the Brookhaven National Laboratory hosts the RHIC accelerator that performs heavy-ion 
collisions and studies the quark-gluon plasma that existed shortly after the Big Bang, and the Stanford 
Linear Accelerator Center (SLAC) now uses its electron accelerator to produce free electron lasers. 
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3.2 Fermilab Accelerator Complex 



The Tevatron is a proton-antiproton storage ring where pp collisions arc made to occur at a 
centre-of-mass energy y/s = 1.96 TcV. Its first collisions were achieved in 1986 and since then 
a series of improvements allowed for many increases in collision energy and instantaneous 
luminosity. The first physics run (Run I) used y/s = 1.8 TeV and took place between 1992 
and 1996. Between 1997 and 2001, both the accelerator complex and the particle detectors 
were improved dramatically. While the Tevatron accelerated only 6 bunches of protons or 
antiprotons in Run I, it accelerates 36 in Run II. Also, while the time interval between two 
consecutive bunch crossings was 3500 ns in Run I, it decreases to 396 ns in Run II. 

Run II started in 2001 and as of January 2011 is still in progress. Initially the Tevatron was 
scheduled to shut down in 2009, when the LHC was supposed to start its physics program at 
y/s = 14 TeV and higher instantaneous luminosity. However, delays in the LHC construction 
and an incident that stopped the LHC for an extra year motivated prolonging Run II at 
the Tevatron until September 2011. Furthermore, as the LHC plans to run about 2 years at 
y/s = 7 TeV and then have a one year and a half major shutdown to prepare the LHC for 
\/s = 14 TeV has prompted Fermilab to seek approval and financing for a Run III for the 
Tevatron from September 2011 through August 2014. 

The main motivation is that the Tevatron has exhibited excellent performance and is running 
smoothly, breaking instantaneous luminosity records every month. The Fermilab Accelerator 
Complex is very well understood after about 25 years of usage. Run II was initially designed 
to collect 2 fb _1 of integrated luminosity and over 10 fb _1 have already been delivered, with 
an expected 12 fb _1 before the end of Run II in 2011. 

Acceleration of protons and antiprotons to collision energies is realized by a complex of 
eight accelerators, two linear (Cockcroft- Walton and Linac) and six circulating synchrotrons 
(Booster, Main Injector, Dcbuncher, Accumulator, Recycler and Tevatron). This huge ac- 
celerator complex consumes 30 MW of electric power and stretches over 9 km. 

Proton-antiproton collisions take place at two points around the Tevatron storage ring, in 
two buildings called BZero and DZcro. A general purpose particle detector surrounds each 
collision point: the Collider Detector at Fermilab (CDF) in the BZero building and the 
DZero experiment in the DZero building. 

The accelerator complex at Fermilab (30] and the CDF and DZero experiments are shown 
schematically in figurc [3~T1 and from a bird's eye view in figurc l3~2l The proton and antiproton 
sources and the various acceleration stages are described in the sections below. 
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FERMI LAB'S ACCELERATOR CHAIN 




Figure 3.1: Schematic view of the Fermilab Accelerator Complex, where protons and an- 
tiprotons are produced, accelerated in subsequent steps and collided in two points around 
the Tevatron storage ring. Credit image to the CDF collaboration. 

3.2.1 Proton Source 

First, protons have to be produced. A strong electric field ionizes hydrogen atoms at room 
temperature (0.04 eV/atom) and sends protons and electrons in opposite directions. The 
protons fall on and stick to a cesium surface. The work needed to free an electron from 
a cesium surface is smaller than in the case of any other atom, since cesium is the most 
reactive atom. A falling atom may collide with a group consisting of a proton and two 
electrons that are temporally together on the cesium surface. The group is thus freed from 
the surface and it forms a hybrid negative hydrogen ion (H~). Thanks to the same electric 
field, a continuous H~ beam of about 25 keV is collected. 

The H~ beam is accelerated by a Cockcroft- Walton accelerator to an energy of 750 keV by 
a constant electric field. The acceleration voltage is limited by the fact that at high voltages 
the air creates sparks. 

The H~ beam is subsequently accelerated to 400 MeV by a 130 m long linear accelerator 
called the Linac. The Linac uses alternating current and resonant frequency cavity tech- 
nology. The continuous beam is therefore bunched up. When outside a cavity, a bunch 
is accelerated by an electric field. When inside a cavity, a bunch does not see the electric 
field now in the opposite direction and therefore is not decelerated. As H~ particles acquire 
momentum, cavities and gaps are longer to provide constant acceleration. A typical bunch 
contains 1.5 • 10 9 particles. A typical pulse contains 4,000 bunches, a total of 6 • 10 12 particles 
and a typical pulse length corresponds to 20 ms. While in the Linac, a particle is accelerated 
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Figure 3.2: Bird's Eye View of the Fermilab Accelerator Complex, looking East. Credit 
image to the CDF collaboration. 

by an electric field of 3MV/m. The beam power is 18 MW when the pulsed hybrid H~ ion 
beam exits the Linac. 

The H~ beam is injected into a 475 m circumference circular synchrotron accelerator called 
the Booster. After the first bending magnet in the Booster, the beam passes through a 
carbon foil, after which we are left with a H + (proton) beam. Since the 20 ms pulse 
length for the Linac is much larger than the 2.2 ms Booster circumference, the H~ pulse 
is present for several rotations of the new proton beam inside the Booster. The fact that 
the incoming pulse is made of H~ instead of protons allows the merging of the two beams 
inside the Booster. The choice of H~ for the linear accelerator is therefore not driven by 
the acceleration process, but by the complex engineering process of transferring the beam 
from the linear accelerator to a circular accelerator. The same process happens also at the 
Brookhaven accelerator complex |31j . 

The Booster synchrotron accelerates charged particles thanks to a resonant frequency cavity. 
As their momentum increases, particles are kept at a constant radius by a corresponding 
increase of the magnetic field. The proton beam is accelerated every turn by a 500 kV 
voltage drop. After completing 16,000 turns in 33 ms, the beam has 8 GeV, exits the 
Booster and enters the Main Injector synchrotron accelerator. The Main Injector injects 
150 GeV protons into the Tevatron synchrotron accelerator and 120 GeV protons into the 
antiproton production complex. 

3.2.2 Antiproton Source 

Antiproton production occurs in the antiproton source. The bunched beam of 120 GeV 
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protons from the Main injector smashes on a 7 cm nickel target every 1.5 s. Particles 
created in the forward direction are recovered through a lithium lens. A pulsed magnet 
acting as a charge-mass spectrometer selects only antiprotons. The antiproton beam is 
pulsed, which means the beam exhibits a large energy spread and a small time spread. To 
be debunched, the beams passed into another synchrotron accelerator, called the Debuncher. 
Low (high) energy antiprotons follow the interior (exterior) path, arrive at different times at 
the resonance frequency cavity. As they see different phases, low (high) energy antiprotons 
are accelerated (decelerated). After about 100 ms, the antiproton beam is almost continuous, 
having a small energy spread and a large time spread. After 1.5 seconds in the Debuncher, 
the beam is injected into a circulating synchrotron called the Accumulator. A new pulsed 
antiproton beam is then inserted into the Debuncher. It takes 1 million 120 GeV protons to 
hit the nickel target for 20 antiprotons at 8 GeV each to be injected into the Accumulator. 

The Accumulator uses stochastic cooling to accumulate antiprotons while keeping them at 
the desired (very small) longitudinal (transverse) momentum for hours, even days. The 
Accumulator has a shape of a triangle with rounded corners. Stochastic cooling [32] trans- 
forms particles from a hot state, with large spreads in energy, to a cooler state, with smaller 
spreads in energy, thanks to a feedback technique using pickups and kickers. The triangular 
shape of the Accumulator is driven by the necessity to have several 16-meter-long straight 
sections to accommodate the pickups and kickers for the stochastic cooling [55) . Van der 
Meer received in 1984 the Nobel Prize for inventing stochastic cooling. 

The continuous beam of 8 GeV antiprotons from the Accumulator is injected in the Main 
Injector. The Main Injector replaced in 1998 the Main Ring situated in the same tunnel 
as the Tevatron. This represents one of the major upgrades from Run I to Run II. The 
Main Injector accelerates both protons and antiprotons in the same ring, using the same 
magnetic field for ensuring the circular trajectory for these particles. 150 GeV antiprotons 
are sent in the Tevatron accelerator where they are accelerated to 980 GeV and collided 
with the proton beam. When a store ends, almost 75% of the antiprotons survive. Since 
creating antiprotons is such a hard task, surviving antiprotons are recuperated in another 
synchrotron accelerator, called the Recycler. 

The Recycler sits just above the Main Injector and acts as a fixed-energy storage ring thanks 
to its permanent magnets and stochastic cooling. The Recycler receives antiprotons both 
from the Accumulator and from the Tevatron at the end of a store. The Recycler acts as 
an antiproton storage ring until the Tevatron is ready to accept antiprotons in a new store. 

3.2.3 Tevatron 

When built in 1983, the Tevatron was the first superconducting synchrotron accelerator. 
The Tevatron's 1000 superconducting electromagnets can produce a magnetic field as large 
as 4.2 Tesla. Electromagnet coils are made of 8 mm niobium-titanium alloy wire. One coil 
contains about 70,000 km of wire. A dipole magnet is about 6.4 m long. Once per turn, 
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particles receive a kick in energy of about 650 kV from a resonance frequency cavity. In 
about 20 seconds the magnetic field increases gradually from 0.66 Tesla to 3.54 Tesla, while 
the beam energies increase gradually from 150 GeV to 800 GeV. Meanwhile, the beams turn 
around the 1 km radius circular accelerator 1 million times. When the beams arrive at 980 
GeV, an electric current of more than 4 kA flows through the electromagnet and creates 
a magnetic field of 4.2 Tesla. For comparison, the superconducting magnets at the LHC 
will run at 8.4 Tesla when the beam energy will be 7 TeV. Superconducting electromagnet 
coils kept at liquid helium temperature (4.3 K) have no resistance and therefore dissipate no 
energy through the Joule effect. Significantly larger currents are able to flow though these 
coils in order to produce very large magnetic fields. Tevatron's cryogenic system is one of 
the world's largest, along with HERA's and LHC's. If it absorbs 23 kW of power, it can still 
maintain the liquid helium temperature. The system can deliver 1000 litres/hour of liquid 
helium at 4.2 K. 

Table 13.11 summarizes the acceleration characteristics of the different stages of the Fermilab 
pp Accelerator Complex. In this table, /3 = ^ expresses the speed of the particle as a fraction 
of the speed of light in vacuum and 7 = — = , 1 is the relativistic factor. Also, for 
highly relativistic particles, kinetic and total energies can be approximated to be the same. 



Table 3.1: Performance of Fermilab Accelerator Complex, where C-W=Cockwroft- 
Walton, L=Linac, B=Boostcr, Dcbuncher and Recycler, M=Main Injector, T=Tevatron, 
A=Accelerator, E=Encrgy. 
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B 
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0.04 cV 


25 kcV 


750 kcV 


400 MeV 


8 GeV 


150 GeV 


0.98 TeV 


P 


9.1 • 10" 8 


0.01 


0.04 


0.71 


0.99 


1 


1 


7 


1 


1 


1 


1.43 


9.53 


161 


1067 



3.2.4 Collisions and Luminosity 

Store 

When 36 new bunches of protons and 36 new bunches of antiprotons enter the Tevatron, 
it is said that a new store starts. A typical bunch length is 0.43 meters. Both beams have 
an average energy per accelerated particle of 980 GeV. A proton bunch contains typically 
3.30 ■ 10 11 protons. An antiproton bunch contains typically 3.60 • 10 10 antiprotons. Since 
antiprotons arc antimatter, they annihilate with regular matter. This is why antiprotons 
arc accumulated about one order of magnitude less than protons. As the two beams collide 
head on at a rate of 2.5 million times per second, pp scatterings occur at a certain rate per 
unit of area, which is described by the instantaneous luminosity. 

Luminosity 

The Tevatron's instantaneous luminosity is given by: 
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c= m.wi. (3 ,) 

The first fraction represents quantities that cannot be easily changed after the experiment is 
started, such as f, the beam revolution frequency at the Tevatron, which is set by the radius 
and the speed of light c; E, the beam energy set by the physics goals of the experiment; 
e n , the beam emittance at injection set by getting the beam into the Tevatron. The second 
fraction presents quantities that can be changed easily during the period of taking data, 
such as Nf,, the number of proton or antiproton bunches found at one time in the Tevatron; 
/3*, the strength of the final focus; N p (Np), the number of protons (antiprotons) per bunch. 

However, the instantaneous luminosity delivered by the Tevatron is not calculated by exper- 
iments using this formula. It is rather measured by a special detector apparatus described 
in section 13.3.21 

As the store's duration increases, instantaneous luminosity decreases exponentially, in the 
first few hours due to the intra-beam scattering and later due to antiproton depletion. 
Instantaneous luminosity is expected to reach 50% in 7 hours and to reach 1/e in 12 hours. 
Typically after 24 hours a store is ended as the proton and antiproton bunches are evacuated 
from the Tevatron. Subsequently new bunches are inserted in the Tevatron and a new store 
starts. 

Table I3~2l compares various parameters of Run I and Run II of Tevatron, especially in terms 
of luminosity (34] . 

Table 3.2: Comparison between Run I and Run II at the Tevatron for various accelerator 
parameters, especially luminosity. 



Parameter 


Run lb 


Run II 


Number of bunches (Nf,) 


6 


36 


Number of protons bunch (N p ) 


2.3 • 10 11 


2.7- 10 11 


Number of antiprotons per bunch (Np) 


5.5 • 10 10 


3.0- 10 10 


Maximum number of antiprotons in a store 


3.3 • 10 11 


1.1 • 10 12 


(3* [cm] 


35 


35 


Bunch length [m] 


0.6 


0.37 


Time interval between consecutive bunch crossings [ns] 


3500 


396 


Number of pp collisions per bunch crossing 


2.5 


2.3 


Energy per p or p [GeV] 


900 


980 


Integrated Luminosity Delivered by the Tevatron [pb~ ] 
Peak Instantaneous Luminosity at the Tevatron [cm" 2 s _1 ] 


112 

2.0 • 10 31 


8000 
3.6- 10 32 



A typical integrated luminosity per week is J Cdt = 8 pb 1 . Figure 13.31 shows the weekly 
and integrated luminosity delivered by the Tevatron since the start of Run II. The analysis 
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presented in this thesis uses an integrated luminosity of 5.7fb . A dataset of about 8.0 
fb _1 is currently under preparation to be shown at the Summer conferences of 2011. 



Collider Run II Integrated Luminosity 




Figure 3.3: Tevatron's delivered weekly (left scale) and integrated luminosity (right scale) 
from Jan 2001 to Apr 2011. The analysis presented in this thesis uses an integrated lumi- 
nosity of 5.7 fb _1 . Credit image to the CDF collaboration. 



Quench 



Stores may end prematurely when the beam is lost in a process called quenching. Quenches 
may happen when a beam hits a superconducting magnet. The magnet is locally not super- 
conducting any more and releases energy by the Joule effect |35j . Soon the whole magnet 
warms up and is no longer superconducting. Physicists then need to wait for the whole 
magnet to be cooled down to liquid helium temperature before inserting a new store in the 
Tevatron. 



Number of Collisions 



Accelerator based particle physics examines final-state particles created by the initial beam 
particle collisions. The most interesting processes often have very rare signatures. Counting 
experiments count the number of events where the particular signature appears; any excess 
above the estimated background is considered as signal. Per unit time, a signature occurs in 
a number of events proportional to the physical probability of occurrence (cross section a) 
and to the beam collision conditions in the accelerator complex (instantaneous luminosity 
C). However, not all events arc reconstructed and identified by the particle physics detector. 
The experimental efficiency (e) measures the percentage of events that are seen correctly by 
the detector. Therefore, the observed number of events per unit of time is given by 
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dN obs 

— ; — = e • 

dt 



(3.2) 



Integrating the instantaneous luminosity in time provides the integrated luminosity, which 
gives the total number of observed events: 

N obs = e • a • J Cdt. (3.3) 

Since the physical cross section of many processes increases with the centre-of-mass energy 
{\fs)i particle physicists try to build accelerators with larger and larger y/s. Furthermore, 
particle physicists try to build better detectors with reconstruction efficiencies for various 
particles very close to one. 
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3.3 The Collider Detector at Fermilab 



The Collider Detector at Fermilab II (CDF II or simply CDF) is a general purpose sub- 
atomic particle detector that surrounds one of the two pp collision points of the Tevatron 
accelerator % and is described in detail in the Technical Design Report of CDF II Detec- 
tor [36] and in the CDF paper [37] ■ During Tevatron Run I there was an older version of 
CDF, CDF I [38]. The detector underwent a major upgrade in preparation for Run II of 
Tevatron, thus creating CDF II. 

The CDF detector has a cylindrical symmetry (i.e. both azimuthal and forward-backward) 
around the proton-antiproton beam pipe. CDF is formed of three main subdetector systems. 
The first and innermost one is the tracking system. It is made of a set of silicon strips and 
an open-cell drift chamber inside a solenoid-produced magnetic field. This system detects 
momenta of electrically charged particles and displacements of secondary particle vertices 
with respect to primary verticeqj. Next comes the calorimeter system, which measures 
energies for both electrically charged and electrically neutral particles as they produce a 
shower of secondary particles through interaction with dense matter^. Finally, outside the 
calorimeter system is located the muon subdetector, which measures the momenta of muon 
candidotefi 

Besides the three main subdetector systems, CDF has a subdetector system called Time 
of Flight (TOF) that measures the mass of low-momentum charged particles, as well as an 
instantaneous luminosity counter called the Chercnkov Luminosity Counter (CLC) that uses 
Chcrcnkov light emission to measure the number of pp collisions per second inside CDF. 

The CDF detector is built around the "beam pipe" , which is a vacuum pipe with a diameter 
of 2.2 cm through which the proton and antiproton beams circle around the Tevatron. The 
pipe is made of beryllium because the material provides the lowest particle interaction cross 
section possiblj^ while also possessing good mechanical properties. Proton beams circulate 
clockwise as seen from the top and enter CDF from the west side, while antiproton beams 
circulate counterclockwise and enter CDF from the east side. 

This section will cover in detail the different subdetectors of CDF II briefly described above. 
In figure 13.41 we can see a diagram of CDF II with one quadrant taken out so that we can 

3 The other general purpose detector at the Tevatron is DZero. There is a friendly competition between 
CDF and DZero, with each detector collaboration thriving to achieve a physics result first, but with other 
collaboration needed to confirm the first result before it is fully accepted by the particle physics community. 
Moreover, only together CDF and DZero have a chance to exclude or discover the Standard Model Higgs 
boson. Therefore, it is customary that, once or twice a year, CDF and DZero combine their Higgs boson 
search in what is called a "Tevatron combination" . 

4 The primary vertex is the position of the primary hard-scattering interaction. 

5 The energies of all the secondary particles from the shower are measured as they are absorbed by the 
calorimeter system. They are all added to obtain the energy of the initial particle. 

6 Muons are minimum ionizing particles and leave only very faint energy deposits in the calorimeter 
system. This is why we need a dedicated muon detector system on the outer part of the CDF. 

7 The atomic number Z of beryllium is 4. Typically the cross section of interaction of a subatomic particle 
with a material increases with Z. 
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clearly see the interior. The various subdetector systems are mentioned in the diagram. 




Intermediate Silicon 

Figure 3.4: A schematic view of the detector CDF II, with a quadrant taken out to reveal 
the interior as well. Various subdetector systems are mentioned in the diagram. Proton 
beams enter from the west side and antiproton beams from the east side. Credit image to 
the CDF collaboration. 



3.3.1 The CDF Coordinate System 

The coordinate system used by CDF reflects the cylindrical symmetry of the detector. How- 
ever, depending on the situation, a cylindrical or adapted spherical coordinate system is 
used. 

The z axis lies along the beamline, with the positive +z side in the direction the protons 
travel (from west to east). A longitudinal plane is parallel to the z axis, while a transverse 
plane is perpendicular to the z axis. Although the Cartesian coordinates x and y are not 
typically used, for the sake of completeness we should note that the +x direction is horizontal 
towards north and the +y direction is vertical and upward. 

The radial coordinate r is the radial distance from the beamline and is expressed by the 
formula: 

r = \Jx 2 +y 2 . (3.4) 

The azimuthal angle is noted <fi, represents the angle made around the beamline and is 
expressed by the formula: 
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Figure 3.5: An on-scalc plane view of the CDF detector, that also shows the elevation. The 
west side (from where the proton beams enter CDF) is on the right of the picture. Credit 
image to the CDF collaboration. 



tan0 = — . (3.5) 

x 



The polar angle, denoted 8, represents the angle made with respect to the beamline and is 
expressed by the formula: 

tan0=-. (3.6) 

z 

However, since 9 is not a Lorcntz-invariant quantitjfl, a derived quantity called rapidity (y) 
is Lorcntz invariant and is defined by the formula 

_ 1 E+ Pz c 
y =2 ln E^Tc> (3J) 

where E is the energy and p z is the longitudinal momentum of a particle described. 

Moreover, since in collider physics the particles involved have total energies much larger 
than their rest energies, we can use the approximation that E as pc. As such, the rapidity is 
approximated by a new quantity called pseudorapidity (77), which is not Lorentz invariant, 
but is approximately Lorentz invariant, and is defined by 

Q 

rj=— In tan — . (3-8) 



"Protons and antiprotons are extended objects travelling along the z axis with an energy of 980 GeV. 
However, not all elementary constituents of protons (generically called partons, such as valance quarks, sea 
quarks and gluons) have the same momentum along the z axis and the number of particles per unit of 6 
angle (^jr)is not Lorentz invariant. 
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The advantage of the rj quantity is that it does not depend on the mass of the particle 
and therefore is a pure geometrical quantity and can be used to describe the trajectory of 
all elementary particles. The cylindrical coordinate system (r, 0, z) is used to describe the 
geometry of the detector, while the adapted spherical coordinates (rj, </>) are used to describe 
the direction of a particle inside the detector. 

A quantity R is used to measure the distance between two directions of particles inside the 
detector (in the rj-<j) plane): 



For a particle with energy E and momentum p, we define transverse energy as Et = E sin 9 
and transverse momentum as pt = psin9. 

3.3.2 The Cherenkov Luminosity Counter 

Although Tevatron accelerator personnel measure the instantaneous luminosity that the 
Tevatron is delivering, only a fraction of it, albeit one close to unity, is recorded by particle 
detectors. This is why each detector has its own subdetector that measures instantaneous 
luminosity as well. 

The general formula of instantaneous luminosity is: 



where N is the number of events (the number of hard (inelastic) scattering interactions) 
recorded per unit time, L the instantaneous luminosity of the beam, a the cross section 
of the given event and e is the efficiency of observing that event in the detector after all 
selection requirements (the fraction of signal events that pass the selection requirements). In 
this particular case, N represents the average number of inelastic pp scattering interactions 
per bunch crossing (fx) times the number of bunch crossings per second (fbc), or the rate of 
bunch crossings): 



R= V(A^) 2 + (A0) 2 . 



(3.9) 



N = L-a-e, 



(3.10) 



N = LI ■ f bc . 



(3.11) 



By plugging equation 13. 1 II into equation 13.101 we obtain the following equation: 



M • fbc = L ■ a ■ e. 



(3.12) 



and we can find an expression for the instantaneous luminosity L: 
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L = ^.£, (3.13) 
cr e 

where ft, c is the number of bunch crossings per second at the Tevatron Run II ( 39 g ns ), and a 
is the cross section of pp inelastic scattering, which has been determined experimentally [39) . 

Only n and e are not known. For this purpose, CDF uses long conical gaseous Cherenkov 
counters to measure the average inelastic interactions per bunch crossing (/i) and the effi- 
ciency to detect them (e). The name of this detector subsystem is Cherenkov Luminosity 
Counter (CLC) [40] [41]. It has been designed especially for CDF II in order to cope with 
increased Tevatron instantaneous luminosity (on the order of 2- 10 32 cm~ 2 s _1 ) and decreased 
time interval between consecutive bunch crossings (396 ns). 

This subdetector system is made of two modules located in the forward regions of CDF (both 
on west and east sides), immediately after the beam pipe, in a 3° gap between the beam 
pipe and the rest of the detector systems, corresponding to a rj region of 3.7 < \r)\ < 4.7, as 
we can see in Figure 13.61 




WEZ 



Figure 3.6: Location of the Cherenkov Luminosity Counter in CDF. Credit image to the 
CDF collaboration. 



Each CLC module is formed of 3 concentric layers of 18 Cherenkov counters each. Each 
Cherenkov counter is long and conical, stretching from the interaction region towards one 
of the forward regions. The Cherenkov cones in the two outer layers (one interior layer) 
have a length of about 180 (160) cm. Being conical, their diameter increases from 2 cm in 
the interaction region to 6 cm in the forward region. At the latter end there is a conical 
mirror that collects light into a 2.5 cm diameter photomultiplier tube (PMT). The light is 



45 



produced by particles created in pp collisions that travel at large ry pscudorapiditics, thus 
travelling only inside one cone and emitting Cherenkov light as they travel through the 
gaseous environment in the cone. Each PMT has a 1 mm thick concave-convex quartz 
window to collect the ultraviolet light of the Cherenkov spectra with a gain of 2 • 10 6 . The 
gaseous environment is isobutane at atmospheric pressure (1 atm), with the possibility to 
increase the pressure up to 2 atm if there is a need to increase the yield of Cherenkov light. 
In this environment, particles emit Cherenkov light with an angle of 3.1° if they have a 
momentum larger than 9.3MeV/c (electrons) and 2.6GeV/c (pions). 

The conical geometry and orientation were chosen so that particles produced in pp collisions 
close to the centre of the CDF detector can travel a large distance inside of the CLC, 
producing an important light yield (several hundred photo-electrons), while the particles 
produced by beam halo or secondary particles travel at smaller \tj\, travel a smaller distance 
in the CLC and have a smaller light yield. 

By requiring a certain minimum light yield threshold in each channel, the background from 
beam halo and secondary particles is rejected and only the signal of particles produced in pp 
interactions is measured. Also, each module has an excellent time resolution of less than 100 
ps. This allows to ask for coincidence hits in the two modules in the forward and rear regions 
of CDF with respect to the z axis, which improves the signal and background separation. 

Finally the CLC subdetcctor measures continuously the parameter while studies on the 
CLC have measured the efficiency e. Using formula 13.131 CDF measures the instantaneous 
luminosity it records. The uncertainty on the instantaneous luminosity measurement is 
5.9% |42| and is quoted as a systematic uncertainty on any measurement at CDF, including 
the Higgs boson search described in this thesis. 

3.3.3 The Tracking Systems 

The CDF detector has two tracking systems that provide very precise measurements of 
charged particle momenta. Both follow the cylindrical geometry of CDF and are embedded 
in a 1.4 T solenoidal magnetic field. Inside the magnetic field, charged particles follow 
helical trajectories, whose parameter^ are measured by a silicon strip vertex detector and 
a cylindrical open-cell drift chamber. 

The closest to the beam pipe is the silicon detector, which is made up of three subdetectors: 
Layer 00 (LOO), the Silicon Vertex Detector (SVX-II) and the Intermediate Silicon Layers 
(ISL). It allows precise measurements of the z coordinate for the primary interaction vertex, 
impact parameters and <f> for tracks, as well as identification of secondary vertices in jets 
originating from 6-hadrons. Next comes the drift chamber, which is called the Central Outer 
Tracker (COT). A schematic view of the tracking systems in CDF is presented in Figure [3~7l 

9 The tracking helical parameters are usually expressed by five non independent variables: <j>, r\ (or cot 8), 
curvature, z and d (the impact parameter, or the distance of minimum approach between a track and the 
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5 LAYERS SILICON LAYERS 

Figure 3.7: Schematic view of the tracking systems at CDF. Credit image to the CDF 
collaboration. 

The Solenoid 

The superconducting solenoid is 5 m long, is made of Nb-Ti stabilized with Al, is operated 
at a current of about 5650 A, and generates a uniform 1.4 T magnetic field parallel to the 
z axis and pointing in the direction of the proton beam. The Lorentz force will bend the 
trajectories of charged particles. Measuring their curvatures by the tracking systems allows 
the precise measurements of the momenta of charged particles. 

The Silicon Vertex Detector 

The physics principle of a silicon detector is a reversed-biased semiconductor p-n junction. 
When a charged particle passes through, it deposits energy in the detector material through 
ionization. This creates electron-hole pairs. Electrons drift towards the anode and holes 
drift toward the cathode. The energy deposited by the incoming particle is estimated by 
the amount of charge deposited in the detector, which is to first order proportional to the 
path length traversed in the material by the charged particle. 

CDF uses a strip geometry for its silicon microdetectors. This allows to reconstruct the 
position of the particle as it travels through the detectors. The distance between two silicon 
microstrips is about 60 /xm. A single incoming charged particle will typically deposit energy 
in more than one microstrip, forming a charge deposition called a "cluster" . CDF employs 
two types of silicon microstrip detectors: single and double-sided. The single microstrip 
detectors have only the p side of the junction segmented into strips that are parallel to the z 
axis, thus allowing r-(f> position measurements. The double sided microstrip detectors have 

beam axis). 
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in addition the n side segmented in strips at an (stereo) angle with respect to the p sides 
(that are parallel to the z axis), thus allowing the measurement of the z position as well. 

Layer 00 

The first layer of the silicon detector, Layer 00 (LOO) [43], is mounted directly on the beam 
pipe between 1.35 cm and 1.62 cm of radius. It has two overlapping hexagonal structures 
that can be seen in red in Figure 13.81 These detectors have to be very resistant to radiation 
from the beam pipe and have only single-sided microstrips. Although they provide only 
r-<f> measurements, they improve the spatial resolution up to 15 /im per hit, whereas the 
resolution is 20 ^m per hit without them. 




Figure 3.8: Detailed diagram of the Layer 00 silicon detector (red) mounted directly on the 
beam pipe at radii between 1.35 cm and 1.62 cm. In the figure we can also see the two 
innermost layers of SVX-II silicon detector. Credit image to the CDF collaboration. 



SVX-II 

Next comes the second silicon vertex detector, the SVX-II [33]. It extends from a radius of 
2.5 cm to a radius of 11 cm and it has a z coverage of \z\ < 45 cm. SVX-II is formed of five 
concentric layers of double-sided silicon microstrip detectors, which allow for a 3D position 
measurement with a spacial resolution of about 20 /im. The first two layers can be seen in 
black in Figure 13.81 and all the five layers can be seen in black in figure 13.91 
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Figure 3.9: Transversed schematic view of the silicon detector systems: Layer 00 (red), SVX- 
II (black) and Intermediate Silicon Layers (green). Credit image to the CDF collaboration. 



Intermediate Silicon Layers 

Further out in radius is the last silicon detector, the Intermediate Silicon Layers (ISL) [45j . 
ISL is made of three double-sided detector layers. Going from small radius outwards, there 
is a layer with 1 < \rj\ < 2 at r = 20 cm, then a layer with |r/| < 1 at r = 22 cm, followed 
finally by a layer with 1 < < 2 at r = 28 cm. The three layers can be seen in green in 
Figure EU 

Tracks in the central region are reconstructed using both silicon and COT information. 
However, only silicon information is used to reconstruct tracks in the forward region, up to 
1< M < 2.8. 

The Central Outer Tracker 

Besides the silicon detector, the tracking system contains a cylindrical open-cell drift cham- 
ber called the Central Outer Tracker (COT) [46]. The COT has a cylindrical geometry, 
extends from a radius of 43.4 cm up to one of 132.2 and has a length of 3.1 m. The COT 
is made up of eight subdetectors called "superlayers" , as can be seen in Figure 13.101 Four 
superlayers have their sense wires parallel to the z axis (axial superlayers) and therefore can 
measure trajectories in the r-<fi plane. The other four superlayers have their sense wires at 
a 2° angle with respect to the z axis (stereo superlayers), which allows for measurements 
along the z axis. 
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Figure 3.10: Schematic view of sense wire planes in the eight superlayers of COT. Credit 
image to the CDF collaboration. 



The 30,240 sense wires in the COT were divided approximately equally between the axial 
and stereo superlayers. Particles originating from the primary interaction vertex that have 
|?j| < 1 pass through all the eight COT superlayers, but those with 1 < |?/| < 1.3 pass 
only through four superlayers. The COT is very precise for measurements in the r-<p plane 
(transverse momentum, px), but less so for measurements along the z axis (longitudinal 
momentum, p z ). 

The gas filling the drift cells that constitute the COT is formed by an Argon-Ethane gas 
mixture and Isopropyl alcohol in the proportions 49.5:49.5:1. The motivation for this choice 
is that of having a constant drift velocity along the cell width. This produces a drift velocity 
of 100 /xm/ns and a maximum drift time of 177 ns, which allows for all the electric charges 
produced by ionization in the drift chamber by the incoming particles to drift away before 
the next bunch crossing takes places (every 396 ns) . The electric field increases exponentially 
with decreasing distance to the sense wire. This produces an avalanche of electrons very 
close to the sense wire, which amplifies naturally the measured current and allows for a 
better measurement. 

Next the electric currents read by the sense wires are processed by an ASDQ chip [47] . The 
ASDQ amplifies the signal, analyses its shape and height and allows for the measurement of 
the deposited charge, which is used to measure the ionization along the trail of the charged 
particle (dE/dx), a quantity that is very helpful in discriminating between different types 
of particles. Next the pulses are digitized by Time to Digital (TDC) boards in the CDF 
collision hall. 

Furthermore, pattern recognition algorithms (tracking algorithms) reconstruct helical tra- 
jectories of particles in the COT. The position resolution for a track is about 140 ^tm and 
the track pt resolution measured using cosmic rays varies with the track px ■ 
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3.3.4 Time of Flight 



(3.14) 



Another improvement of CDF during the upgrade to Tevatron Run II was the addition of a 
new subdetector aimed at improving particle identification at low momentum. A Time Of 
Flight (TOF) system [35] was built outside the solenoid magnet of the tracking system at 
a radius of approximately 1.5 m. The TOF measures the time interval particles travel from 
the primary vertex of pp collisions to the TOF system. This allows the separation of low 
momentum protons, kaons and pions. 

The TOF system consists of 216 bars of scintillating material approximately 3 m in length 
and with a cross section of 4 x 4cm 2 . These bars are arranged in a cylindrical geometry 
around the tracking system. The TOF coverage in r\ is \r\\ < 1. The scintillating material 
was chosen to be Zircon 408, because it has a short rise time and a long attenuation length 
(308 cm). 

The particle separation principle is the following. All particles are produced at the same 
time from the same primary interaction. As they pass through the scintillating material, 
they deposit small flashes of visible light that are detected by photomultiplicr tubes (PMT) 
attached to both ends of each scintillating bar. Next a preamplifier circuit mounted directly 
on the PMT processes the light signal. Then the readout electronics digitizes both the 
amplitude and time for the light signal. The time interval is digitized by a Time to Digital 
Converter (TDC) only when the signal reaches a fixed discrimination threshold. Moreover, 
the TDC corrects for the effect that a larger amplitude signal reaches the threshold first 
(time walk effect). If a particle crosses the scintillating bar just in front of a PMT, the time 
resolution is about 110 ps. If the particle crosses the scintillating bar farther away from a 
PMT, the resolution worsens [49] . 

At its best performance (110 ps resolution) the TOF can distinguish charged pions and kaons 
with momenta p < 1.6 GeV/c with a precision of two standard deviations. This information 
is complementary to the particle separation due to the dE/dx effect measured by the COT. 
Figure 15. 1 II shows the effect of TOF and COT superimposed in particle separation. 

3.3.5 The Calorimeters 

Besides the trajectory and momentum of a given particle, one needs also to determine its 
energy in order to reconstruct an event fully. This role is played by the calorimeter systems. 

The calorimeter detectors in CDF II are "sampling" calorimeters. They have a sandwich 
structure, where dense absorbing material alternates with scintillating material. As particles 
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Figure 3.11: Time of Flight subdetector performance in distinguishing pions, kaons and 
protons, superimposed to the COT performance for the same task [50]. Credit image to the 
CDF collaboration. 



pass through the absorbing material, they develop showers of secondary particles. These 
secondary particles emit light as they pass through the scintillating material. The light is de- 
tected and measured by photomultiplier tubes (PMT). By adding up the energies measured 
by all the PMT in the calorimeter systems, the energy of the initial particle is measured. 

Each calorimeter is made of two parts. The innermost part is an electromagnetic calorimeter 
(EMCAL), which measures the energies of electromagnetic objects (electrons, positrons and 
photons). EMCAL has a large number of radiation lengths Xq and a small number of 
interaction lengths. The outermost part is the hadronic calorimeter (HADCAL), which 
measures the energy of hadrons, such as pions, mesons, b hadrons. HADCAL has a large 
number of interaction lengths. Electromagnetic particles start to shower immediately as 
they enter the calorimeter and most of their shower is contained in the EMCAL. On the 
other hand, hadronic objects deposit very little energy in the EMCAL and most of the 
energy in the HADCAL. This structure for the calorimeters allows to distinguish between 
the electromagnetic objects and hadrons. 

Each calorimeter has a projective geometry, meaning that it is divided in r\ and <p towers 
that point towards the interaction region. That way a particle entering a calorimeter tower 
tends to spend most of its trajectory in that same tower. The calorimeter systems have a 
2ir coverage in <fi and |r?| < 3.6 in pscudorapidity. 

The calorimeter system is made up of three parts spanning different geometric regions: the 
central calorimeter in the barrel region (both electromagnetic, CEM, and hadronic, CHA), 
the plug calorimeter in the forward region (both electromagnetic, PEM, and hadronic, PHA) 
and the end wall calorimeter in between the central and plug calorimeters (only hadronic, 
WHA). 

In Figure 13.121 we can see a schematic view of the calorimeter detectors in one of the CDF 
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barrels. 




Figure 3.12: Schematic view of the calorimeter detectors in one of the CDF barrels (PEM, 
CEM, CHA, WHA, PHA). Credit image to the CDF collaboration. 



Central Electromagnetic Calorimeter (CEM) 

The CEM [51] covers the region \r)\ < 1.1. The CEM has 24 towers in </> and 10 towers in 77. 
It is a sampling calorimeter with lead as the absorbing material alternated with scintillating 
material. It has 18 radiation lengths (Xq). The energy resolution of the CEM is 

/ 3 - 5% 9 2%, (3.15) 
E ( GeV) V ' 

where the notation © represents the sum in quadrature of the constant and stochastic terms. 

In Figure [3. 131 we can see a schematic view of a single wedge from the CEM calorimeter. 

Central Electromagnetic Shower Maximum Detector (CES) 

The goal of CES [51] is to improve the position measurement of electromagnetic showers in 
the CEM. To achieve this goal, CES is a proportional chamber with wire and strip readout 
located inside the CEM at the position where on average the showers created by electrons, 
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Figure 3.13: Schematic view of a single wedge from the CEM calorimeter. Credit image to 
the CDF collaboration. 

positrons and photons have a maximum number of particles (at a depth of 6 Xq) in the 
CEM. This is also called a shower maximum and it is reflected in the name of the detector. 
For an electromagnetic object with an energy of 50 GeV, the position resolution achieved 
by the CES is 0.2 cm. 

Plug Electromagnetic Calorimeter (PEM) 



The PEM covers the region 1.1 < \r)\ < 3.6 and PEM is formed by 24 (48) towers in for 
the inner (outer) groups and 12 towers in r) for all groups. As for the CEM, it is also a 
lead-scintillator sampling calorimeter. Its thickness is slightly larger (21 Xg). The energy 
resolution of the PEM is 

^ = -i^®0.7%. (3.16) 
E ( GeV) V ; 

Plug Electromagnetic Shower Maximum Detector (PES) 



Similar to CEM, PES [52] measures precisely the position of the maximum of electromagnetic 
showers in the PEM. Also as for the CEM, the PES is located at a depth of 6 Xq radiation 
lengths inside the PEM. The PES is made of two layers of 5 mm wide scintillator strips and 
each layer is at a 45° angle relative to the other. 

The properties of the electromagnetic calorimeters at CDF are summarized in Table I3~3l 
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Table 3.3: Summary of various properties of the electromagnetic calorimeter subsystems at 
CDF. 



System 


77 coverage 


Energy resolution (%) 


Thickness 


Absorber 


CEM 
PEM 


0.0 < < 1.1 
1.1 < r? < 3.6 


13.5/V^t ©2 
14.4/V^© 0.7 


18 X 
21 X 


3.2 mm lead 
4.5 mm lead 



Central Hadronic Calorimeter (CHA) 



The CHA [53] covers the region \r]\ < 0.9 and the CHA uses iron as a showering material 
that alternates with scintillator material. The CHA is segmented in 24 towers in 4> and 8 
towers in 77. The CHA is located radially just outward of CEM. Each CHA tower is made 
of 32 layers, with a total of 4.7 interaction lengths (A/). The energy resolution of CHA is 

y 5 ° % 03%. (3.17) 
E ( GeV) V ; 

Plug Hadronic Calorimeter (PHA) 

The PHA [S3] covers the region 1.3 < |r/| < 3.6. It is constituted from 23 layers of sandwiched 
iron (as absorbing material) and scintillating material. The energy resolution for the PHA 
is: 

= , 80% 05%. (3.18) 
E ( GeV) 

In Figure 13.141 we can see a schematic view of one of the two forward (plug) calorimeters 
(PEM and PHA). 

Wall Hadronic Calorimeter (WHA) 

The WHA [53] covers the region 0.7 < \r)\ < 1.3. It extends the r\ coverage between the 
central and plug hadronic calorimeters. It is also an iron-scintillator sampling calorimeter. 
There are 15 5.0 cm thick iron layers alternating with 1.0 cm thick scintillator. The energy 
resolution of WHA is 

a -§= , 75% 04%. (3.19) 
E ^E { GeV) 

3.3.6 The Muon Chambers 



Although nearly all particles are absorbed by the calorimeter system, muons pass through 
the calorimeters as minimum ionizing particles and then exit the calorimeter svsteirl^l. The 



10 Neutrinos are the only known particles that leave the detector completely undetected. 
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BEAMLING \ — L 



Figure 3.14: Schematic view of one of the two forward (plug) calorimeters (PEM and PHA). 
Credit image to the CDF collaboration. 



outermost subdetector of CDF is a muon system. It is made out of single wire drift chambers 
and scintillator counters for fast timing, located radially just outside the calorimeter system. 

There are various muon subsystems as follows: the Central Muon Detector (CMU), the 
Central Muon uPgrade Detector (CMP), the Central Scintillator uP grade (CSP), the Central 
Muon extension Detector (CMX), the Central Scintillator extension (CSX), the Toroid 
Scintillator Upgrade (TSU), the Barrel Muon Upgrade (BMU) and the Barrel Scintillator 
Upgrade (BSU). 

The CMU, CMP and CSP systems cover an \rj\ range of \r)\ < 0.6, the CMX and CSX 
systems cover an |?7| range of 0.6 < |7y| < 1.0 and the TSU, BMU and BSU subsystems cover 
an \r]\ range of 1.0 < |?7| < 2.0. The muon subsystems can be seen in Figure [3.151 

The innermost muon system is CMU [55] . It was built for CDF I and is located just outside 
the CHA calorimeter, at a radius of 350 cm and arranged in 12.6° wedges in <f>. Each 
wedge has three layers (stacks) and each stack has four rectangular drift chambers. Each 
drift chamber is filled with the same gaseous composition as the COT (argon-ethane-alcohol 
49.5:49.5:1) and has a 50 /im sense wire in the middle of the cell, parallel to the z axis. 

The CMU is followed by another muon system, the CMP [SB]. The CMP has rectangular 
geometry and is formed of four layers of drift chambers of the same constitution as above. 
In preparation for Tevatron Run II, a 60 cm thick layer of steel was added. The pr threshold 
for the CMU (CMP) is 1.4 (2.2) GeV/c. 
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Figure 3.15: Diagram in the rj-(j) plane of the muon systems at CDF: CMU, CMP, CMX and 
BMU muon detectors. The BMU detector is referred in this diagram as IMU. Credit image 
to the CDF collaboration. 



On the outer side of CMP there is the CSP [S^, formed of a single rectangular layer of 
scintillating material with a wave guide to transport the light to a PMT. The CSP fast 
response is used in triggering. 

The CMX muon system is located at each edge between the CDF barrel and forward regions. 
It has a conical geometry with drift chambers similar to the CMP. Also, it has a scintillating 
system called the CSX, similar to the CSP. The CMX system covers 360° with 15 wedges 
in </>. Each wedge is formed of eight layers of drift chambers in the radial direction. 

Various properties of the muon subsystems are summarized in Table 13.41 



Based on the timing information provided by the individual drift chambers, short "tracks" 
of ionization in the drift chambers (called "stubs") are reconstructed in the muon chambers. 
At CDF a muon candidate is formed when a track reconstructed by the COT is matched to 
a stub in the muon system. In the reconstruction process, a % 2 in the tfi coordinate (x$) is 
computed for the match between a COT track and a muon stub. To ensure good quality of 
muon candidates, an upper limit cut is set on Xs- 

3.3.7 The Trigger System and Data Acquisition 

The trigger system is indispensable at each collider physics experiment because the collision 
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Tabic 3.4: Summary of various properties of the muon subsystems at CDF. 





CMU 


CMP/CSP 


CMX/CSX 


r\ coverage 


0-0.6 


0-0.6 


0.6-1.0 


Minimum px [GeV/c] 


1.4 


2.2 


1.4 


Drift Tubes 




Thickness [cm] 


2.68 


2.5 


2.5 


Width [cm] 


6.35 


15 


15 


Length [cm] 


226 


640 


180 


Maximum drift time [/is] 


0.8 


1.4 


1.4 


Scintillators 




Thickness [cm] 


N/A 


2.5 


1.5 


Width [cm] 


N/A 


30 


30-40 


Length [cm] 


N/A 


320 


180 



rate is much larger than the maximum rate of data events stored for analysis. At the 
Tevatron the theoretical bunch crossing (collision) rate is 2.5 MHz for 36 bunches of protons 
and 36 bunches of antiprotons. In practice, there are 1.7 MHz pp collisions per second. Even 
so, this rate is a lot larger than the 50 Hz rate to save data eventt0 to magnetic tapj^l 
Also, on average a data event needs about 250 kB of data storage. Even if the rate to copy 
these to magnetic tape would be sufficient, there would not be enough storage space to save 
all these events. Nor would it be easy to analyze all these events afterwards. 

There are three essential conditions that the CDF trigger system has to meet. First, the 
trigger has to be quick enough to make a decision about an event (whether to save it or 
reject it) before the next event comes in (in other words, there should be zero dead time). 
Second, the Tevatron collider imposes that a new event comes in every 396 ns. Third, the 
magnetic tape can only save about 50 events per second. 

In conclusion, the goal of the CDF trigger system is to analyze, every second, 1.7 million 
events and decide also every second which 50 of these events will be saved to tape. 

Figure 13.161 represents a diagram of the three trigger levels in the CDF trigger system. 

CDF Trigger Levels 

There are three trigger levels at CDF, and each needs a certain amount of time to reach a 
decision whether to reject an event or send it to the next trigger level. 

The first trigger level (LI) uses hardware-based custom electronics to try to reconstruct 
physics objects using only a subset of CDF information. LI makes a decision by object 

11 In CDF, an event is defined as all the recorded particle activity in the detector during a bunch crossing. 
On average, more than one pp hard scattering happens during an event, with multiple primary vertices 
present in the event (pile-up). 

12 One might be surprised that in the era of CDs, DVDs and USB storage keys, collision data is saved 
on old fashioned magnetic tapes. The motivation comes from the fact that the latter perserve the data for 
more years and with lower costs than the former. 
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Dataflow of CDF "Deadtimeless" 
Trigger and DAQ 
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1.7 MHz Crossing rate 
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L1 trigger 
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DAQ Buffers 
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PJW 10/28/96 



Figure 3.16: Diagram of the CDF trigger system (Level 1, Level 2, Level 3, Mass storage) 
Credit image to the CDF collaboration. 



count and energy values. The second trigger level (L2) also uses custom designed hardware 
to reconstruct better physics objects. This information is passed to programmable processors 
to make decisions. The third trigger level (L3) uses all CDF information and a PC farm of 
about 500 CPUs to do a full event reconstruction and decide if the event is kept. 

Trigger Paths 



A trigger path represents a sequence of requirements that an event has to pass at LI, L2 
and then at L3. About 100 trigger paths are implemented by the CDF II trigger system. 
An event will be saved to tape if it passes the requirements of at least one trigger path. 
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LI Trigger Level 



LI reduces the event rate from 1.7 MHz to about 40 KHz, thus discarding about 98% of 
events. In order to have enough time to analyze each event, LI uses a pipeline and 42 buffers 
so that LI has 2 /is to analyze each event. 

The input for LI comes from the tracking, calorimeter and muon systems. The decision to 
send an event to L2 is taken using the number and energy values of electron, muon and jet 
candidates, as well as the value of missing transverse energy 0, or the kinematic properties 
of a pair of tracks. 

A subsystem called eXtremly Fast Tracker (XFT) [37] reconstructs at LI high-py tracks 
(jPt > 1-5 GeV/c) using COT information. XFT uses a digitized readout of ionization 
(hits) in the COT. The hits from COT's superlayers are combined into segments. Pattern 
recognition algorithms group segments into tracks that cross the entire COT. The tracks re- 
constructed by the XFT are combined with information about energy deposits in calorimeter 
(muon) systems to produce LI electron (muon) candidates. 

Energies of jets, electron and photon candidates, as well as missing transverse energy and 
sums of jet energies in an event are approximated using clusters of energy in the calorimeter 
systems. 

The only muon system used at LI is CMU. 
L2 Trigger Level 

Events accepted by LI are sent to L2, where the event rate is reduced from 40 KHz to about 
500, thus discarding about 99% of events passed to L2. L2 uses 4 buffers in order to have 
enough time to analyze each event (about 20 fxs). 

At L2 a better event reconstruction is performed. The tracking is improved by taking 
into account the silicon tracking information as well. Also, better track reconstruction and 
calorimeter clustering for jet finding algorithms are used. The reconstruction of electron 
and photon candidates is improved by taking into account as well the information from 

13 As neutrinos escape the detector without being detected, the energy they carry appears as missing energy 
in the event. Since the partons in protons or antiprotons have a well measured energy in the transverse 
plane but a distribution of energy along the z axis, we can only say that the vector sum of energies should 
be zero in the transverse plane, but we are not able to say the same for the z axis. This is why we refer only 
to missing transverse energy. Also note that this happens at all hadron colliders, LHC included, and that at 
lepton colliders, such as the PEP-II collider (for the BABAR experiment) and the KEKB collider (for the 
Belle experiment), the energy is well measured in all directions and they can refer to missing energy, not 
only missing transverse energy. 

14 Jets are reconstructed at L2 with the help of L2 cluster finder (L2CAL), which starts from a seed of 3 
GeV calorimeter tower and adds adjacent towers with energies larger than 1 GeV. 
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the central calorimeter shower maximum subdctcctor (CES) 0. Also, secondary vertices 
are reconstructed at L2 inside certain jets using the Silicon Vertex Tracker (SVT) trigger 
subsystem [58] 0. Also, all muon systems are used to combine hits in the muon chambers 
with the LI XFT reconstructed track to produce L2 muon candidates. 

Figure [5. 171 represents a block diagram of the LI and L2 trigger levels at CDF. 
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Figure 3.17: Diagram of the first (LI) and second (L2) trigger levels at CDF. Credit image 
to the CDF collaboration. 



L3 Trigger Level 



If an event is accepted at L2, it is sent to L3, where a full detector readout is done and a full 
reconstruction is done using the computer farm. If an event passes L3 requirements then it 
is sent to the Consumer Server/Logger (CSL) that is the final component of the CDF data 
acquisition. 

Consumer Server/Logger 

The CSL categorizes events by trigger path, writes to disk those that pass at least one of the 
trigger paths and sends a fraction of these events to online processors for online monitoring 
of data quality. Figure 13.181 represents a diagram of Consumer server/Logger. 

15 This way the resolution for electron and photon showers is better than the cluster location. This 
information is combined with the tracking information to reconstruct better electron candidates. 

16 Jets originating from a b quark contain a secondary vertex displaced by about 3 mm from the primary 
pp interaction vertex due to the fact that b quarks live longer than other quarks before decaying. The decay 
products appear emerging from the same vertex. This is called secondary vertex. 

17 The information from SVXII subdetector is combined with the LI XFT reconstructed track to recon- 
struct both a more precise track and reconstruct a secondary vertex inside a jet that originates from a b 
quark. 
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Figure 3.18: Design diagram of the Consumer Server/Logger. Credit image to the CDF 
collaboration. 



As a graduate student on CDF, I did three one- week online data quality monitoring shifts 
and one three-month online data acquisition and detector control shift 

3.4 Summary 



In this chapter we have presented the experimental infrastructure used for the WH asso- 
ciated production direct search presented in this thesis. We started by introducing the US 
national particle physics laboratory, Fermilab. We then presented in detail the Fermilab ac- 
celerator complex that accelerates and collides protons and antiprotons at a centre-of-mass 
energy of y/s = 1.96 TeV with the help of a chain of particle accelerators formed by the 
proton source, the Cockwroft- Walton, Linac, Booster, Dcbunchcr, Recycler, Main Injector 
and the Tevatron accelerator. We continued with the detailed description of the Collider 
Detector at Fermilab, the apparatus that records the elementary particles produced in the 

18 At CDF these shifts are called "Consumer Operator shifts". 
19 At CDF these shifts are called "ACE shifts". 
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proton-antiprotons collisions delivered by the Tevatron accelerator. We first introduced the 
CDF coordinate system and the Cherenkov Luminosity Counter. We then described the 
several subdetectors structured as layers of CDF which measure several properties of the 
elementary particles. The first layer is formed by the tracking system, which measures pre- 
cisely the momenta of elementary particles. The second layer is the calorimeter system, 
which measures precisely the energy of elementary particles. The third layer is dedicated 
to the muons. From a total of 2.5 million collisions per second, only about 100 collisions 
per second are chosen to be stored by the trigger system, which is deployed by way of three 
levels. 

In the next chapter we will present how the several detector systems are used in order to 
reconstruct the elementary particles used in our WH analysis. 
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Chapter 4 

Object Identification 



The energy deposits in the CDF subdetectors are digitized and transformed into electronic 
signals that are then reconstructed to high-level objects such as primary vertices, tracks, 
calorimeter clusters and muon stubs. Physics objects such as electrons, muons, jet candidates 
and missing transverse energy are reconstructed by applying cuts, or selection criteria, on 
high level objects. In this analysis we present a WH — > Ivbb search. The final state contains 
therefore a charged lepton (an electron or a muonQ), missing transverse energy due to the 
undetected neutrino, and two jets originating from bottom (b) (marks. 

A primary vertex represents the reconstructed position of the primary pp collision that 
produced primary particles. In the case of our signal, these are the W boson and the Higgs 
boson. 

A track represents the reconstructed trajectory of a charged particle in the tracking systems 
from the particle's electromagnetic interactions in these detector systems. 

A calorimeter cluster is a collection of adjacent calorimeter towers where energy is deposited 
due to an incoming particle, either electrically charged or electrically neutral. 

A muon stub is a collection of energy deposits in adjacent muon chambers in the muon 
systems. 

This chapter presents the reconstruction techniques for the basic physics objects used in 
this analysis, such as tracks, primary vertices and calorimeter clusters. Next comes the 
reconstruction of high-level physics objects such as electron, muon and jet candidates, as well 
as missing transverse energy, which are used by the event selection. Finally, two algorithms 
used to identify jets originating from b quarks are described. 

1 ~We do not consider the tau lepton final state directly. However, tau leptons that decay leptonically to 
electrons or muons plus neutrinos contribute to our signal, background and data selection. 
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4.1 Track Identification 



Tracks are reconstructed at CDF using information from the tracking systems. They are 
used in primary vertex reconstruction, charged lepton identification and identification of 
jets originating from b quarks. A track represents a reconstructed three dimensional helical 
trajectory of a charged particle passing through the solenoidal magnetic field in the tracking 
systems and is described by the following parameters. 

The half-curvature of the trajectory (C) is defined as 



where Q is the electric charge of the track, and p is the radius of the circle formed by the 
projection of the helical trajectory on the transverse plane. C carries the same sign as the 
electrical charge of the particle and is inversely proportional to the transverse momentum 
(pt) of the track. 

cot 9 is the cotangent of the polar angle of the trajectory at the closest approach to the 
primary interaction vertex. 

The impact parameter (do) is the minimal distance in the transverse plane between the 
trajectory and primary interaction vertex. It is given by the expression 



where xq and yo are the coordinates in the transverse plane of the centre of the circle of the 
helix. 

4>o is the azimuthal angle of the trajectory at the closest approach to the primary interaction 
vertex. 

zq is the z coordinate position of the trajectory at the closest approach to the primary 
interaction vertex. 

4.1.1 Tracking Algorithms 

There are three tracking algorithms that we use in this analysis: COT stand-alone tracking, 
Outside-In (01) tracking and Inside-Out (10) tracking. 




(4.2) 



65 



COT stand-alone tracking 

The COT stand-alone tracking algorithm uses only COT information (with no information 
from the silicon detectors). Electromagnetic interactions inside the COT cells or silicon 
detectors are called hits. First, hits in each superlayer are fit together to reconstruct short 
tracks. Then, short tracks from all the supcrlayers are fit together to form a COT stand- 
alone track. The details of this latter fit are the following. Since axial and stereo superlayers 
alternate, first a fit is performed where only the axial superlayers are considered in the order 
from the most outer one to the most inner one. Then, stereo layers are added and a new 
fit is performed. The final COT stand-alone track needs to have hits in at least 2 axial and 
2 stereo superlayers. This tracking algorithm is used in the central region of the detector, 
corresponding to \rj\ < 1.1. 

Outside-In tracking 

The Outside-In (01) tracking algorithm starts with a COT stand-alone track (called a seed 
track) and adds high-resolution hits from silicon detector information. First the axial silicon 
hits are added to the COT stand-alone track. Then, for each silicon wafer, every hit on a 
stereo silicon strip is added to a different copy of the current track. After the last silicon 
wafer has been processed, there are a multitude of track candidates. The Outside-In track 
is chosen to be that with the largest number of silicon hits and with the lowest x 2 over the 
number of degrees of freedom. This tracking algorithm is used in the central region of the 
detector (|ry| < 1.1). 

Inside-Out tracking 

The Inside-Out (10) tracking algorithm is needed for tracking in the forward regions of 
CDF. The algorithm starts with hits in at least three layers of the silicon detector. Then 
hits in the COT that have not already been used by the COT stand-alone and 01 algorithms 
are added to produce the final 10 track. 

4.2 Primary Vertex Identification 

CDF uses two main algorithms to reconstruct primary interaction vertices (P V) . The locus 
of all PVs represents the beamline, or the luminous region of the detector. 

4.2.1 Primary Vertex Reconstruction Algorithms 

The ZVertexFinder algorithm [52] takes as an input a set of tracks passing minimum quality 
requirements based on the number of silicon and COT hits. The algorithm computes an 
error weighted average (zq) of z coordinates of these tracks, which is given by 
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(j/g) 



(4.3) 



The algorithm outputs a collection of PVs characterized by their own quality, track mul- 
tiplicity, z position, z position error and pr- However, PVs output by the ZVertcxFindcr 
algorithm present no x and y position information. Each reconstructed PV corresponds ei- 
ther to a hard scattering or to an underlying event of a hard scattering. It may also happen 
that a physical PV gets reconstructed into two PVs due to tracking resolution. The PV 
transverse momentum (pt,pv) is defined as the sum of the transverse momenta of its tracks 
(52tracks Pt) an d convevs the information of how energetic a PV is. Typical WH PV candi- 
dates have (J2tracks Pt) on the order of 100 GeV. The PV quality conveys the information 
of how well the PVs are reconstructed. PV quality is based on the track multiplicity, as 
shown in Table 14.11 



Table 4.1: Primary Vertex Quality Criteria. 



Criterion 


Quality Value 


Number Si -tracks>3 


1 


Number Si -tracks>6 


3 


Number COT-tracks>l 


4 


Number COT-tracks>2 


12 


Number COT-tracks>4 


28 


Number COT-tracks>6 


60 



The PV with the best chances to be the PV of the interaction triggered on is considered 
the event PV. The CDF collaboration used to use a run-averaged beamline position as an 
event PV. CDF developed in 2003 an algorithm called the Prime VertexFinder [60] [61] that 
reconstructs a 3D event PV on an event-by-event basis. This algorithm allowed CDF to 
improve the efficiency of identifying jets as originating from a bottom quark (b-tagging) for 
shorter secondary vertex displacements and to reduce the systematic uncertainties due to 
the run-dependent beam position variation. Prime VertexFinder takes as an input a set of 
good quality tracks in good agreement (x 2 < 10) with a seed vertex (usually the beamline 
position or one of the PVs output by ZVertcxFindcr). These tracks are reconstructed to 
a new 3D PV and checked if they are still in good agreement with the new PV. Tracks 
with x 2 > 10 are rejected. The remaining tracks are reconstructed to a new 3D PV. The 
procedure is iterated until all remaining tracks have a x 2 < 10 with respect to the latest 
PV. The last 3D PV becomes the event PV. 3D position information is crucial for b-tagging 
techniques that use information about the bottom quark lifetime. 

A PV position is represented by (xpy , ypv, zpy). A typical longitudinal width is a z = 29 
cm. A typical transverse width is circular, smaller at the centre of the detector, <jj_, z =o cm = 
30 /zm and larger at the extremities, crj_. 2= 4o C m — 50 /jm. Typical xpy and ypy are very 
small, on the order of tens of microns. Event PV reconstruction is trusted only in the 
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luminous region (|zpy| < 60 cm). Events with the event PV outside the luminous region 
are rejected (luminous cut). 

4.2.2 Primary Vertex Definition Studies 

In my Master of Science thesis |30j I compared two possible definitions for the primary in- 
teraction vertex for high-pr events with the charged lepton+missing transverse energy+jets 
signature (top quark pair production) at CDF. The analysis presented in this PhD thesis 
also uses a signature of a charge lepton + missing transverse energy + jets (the WH search). 
The only difference is that there are two jets in our case and there are four jets in that for 
ti. The two possible primary vertex definitions were: 

• The primary vertex with the closest z coordinate to the zq coordinate of the charged 
lepton (electron or muon) reconstructed track. 

• The primary vertex with the largest transverse momentum of all the primary vertices 
with the z coordinate within 5 cm of the zq coordinate of the charged lepton (electron 
or muon) reconstructed track. 

The study concluded that both definitions are equally efficient and therefore confirmed that 
the definition used by CDF was the best possible one. 

4.2.3 Primary Vertex Reconstruction at LI Trigger Level Study 

During my first year of PhD studies I performed a study within the Higgs Trigger Task 
Force (HTTF) [62] at CDF. The goal of HTTF was to design new triggers to increase the 
acceptance of Higgs boson events. As part of that effort, I performed a study to evaluate 
if the triggers would benefit from the ability to know at LI Trigger Level if the primary 
interaction vertex was in the east or west side of CDF. If the answer were yes, then a hardware 
based hit count would have been implemented to evaluate in which half of CDF more particle 
activity was recorded. However, the study showed that this information would not have 
changed the trigger efficiency significantly and therefore primary vertex identification was 
not introduced at LI. However, other efforts were proven to be worthwhile. For example, a 
new missing transverse energy + jets trigger was designed, and we currently use this in our 
analysis. 

4.3 Calorimeter Clustering Algorithm 

High transverse momentum electrons, photons and jets leave energy deposits in the calorime- 
ter systems in adjacent calorimeter towers. Together they form a calorimeter cluster, which 
is reconstructed using a clustering algorithm. The first step is finding a seed cluster that 
has an energy deposit larger than a certain threshold value. Then neighbouring clusters 
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with energy deposits larger than another lower threshold are also added to the cluster. The 
total energy of the cluster is the sum of the energy deposits in each calorimeter tower. The 
position of the cluster is computed as an energy-weighted average of the positions of each 
tower in the cluster. Then, in order to improve the precision for the cluster position, the 
calorimeter cluster is matched with a cluster in the shower maximum detector. The latter 
type of cluster is built with a similar algorithm, but one that is optimized to achieve a better 
position resolution. 

4.4 Charged Lepton Identification 

In this section we will discuss charged lepton identification. Except when otherwise men- 
tioned, a process described for a particle is also true for its antiparticle. In this analysis we 
use electron, muon and isolated track candidates. 

4.4.1 Electron Identification 

An electron typically deposits most of its energy in the electromagnetic calorimeters. The 
basic selection for an electron candidate is a high p^ track, isolated from other activity 
in the tracking systems, which is matched to an electromagnetic calorimeter cluster. The 
isolation requirement ensures that the charged lepton candidate originates in the primary 
interaction vertex and therefore is the daughter particle of the W boson, and does not 
originate in a B hadron scmi-lcptonic decay, as in the case of a jet originating in a b quark. 
Further selection criteria summarized in Table 14.21 are required to reconstruct the tight 
central electron candidates in the region |?7| < 1.1 (CEM calorimeter) and therefore noted 
in this thesis as CEM. 

Table 4.2: Criteria for central electron candidate (CEM) identification. 



Criterion 


Central Electron (CEM) 




> 20 GeV 


Eh ad 1 Eem 


< (0.055+ (0.00045 ■ E)) 


Isolation 


< o.i 


Track zq 


< 60 cm 


Track px 


> lOGeV/c 


COT Axial Segments 


> 3 


COT Stereo Segments 


> 2 




< 0.2 


E/p 

x 2 


< 2.0 for p T < 50 GcV/c 


< 10.0 


Q ■ Ax 


-3.0 < Q- Ax< 1.5 


\Az\ 


< 3.0 cm 



Here is the explanation of the notations from Table 14.21 
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• The transverse energy of the calorimeter cluster is Et- The trigger requires Et > 
18 GeV, but the analysis requires Et > 20 GeV to make sure that the trigger is fully 
efficient. 

• Eh ad I Eem is the ratio between the energy deposited in the hadronic calorimeters 
(CHA, WHA or PHA) and the energy deposited in the electromagnetic calorimeters 
(CEM). This ratio is very small for electron candidates, on the order of 5%. 

• Isolation is defined as the ratio between energy deposited in all the additional towers 
located in a cone of radius R = \J (A</>) 2 + (A^) 2 = 0.4 around the calorimeter cluster 
and the energy of the calorimeter cluster itself. Isolation is required to be smaller than 
0.1, which means that the calorimeter cluster is required to be isolated from other 
significant energy deposits in the calorimeter. 

• Track Zq is the z coordinate position where the isolated track intersects the beamline. 

• Track px is the transverse momentum of the electron candidate that is explicitly 
measured by the track curvature. 

• COT Axial (Stereo) Segments is the number of axial (stereo) COT layers that have at 
least 5 hits each associated with the track. 

• L s h r is a quantity that measures how well the theoretical electron shower profile 
matches the distribution of energy in the calorimeter cluster. 

• E/p is the ratio between the energy of the calorimeter cluster and the track momentum. 

• x 2 is the x 2 of the fit of the shower-maximum profile measured by the shower maximum 
detector (CES or PES) with respect to the electron test beam data. 

• Ax is the signed difference in x coordinate between the track and the calorimeter 
cluster when the track is extrapolated to the position of the shower maximum. 

• Q is the measured electric charge of the electron candidate (negative for electron and 
positive for positron). 

• |Az| is the absolute value in the z coordinate between the position of the calorimeter 
cluster and the position of the track that is extrapolated to the position of the shower 
maximum. 

4.4.2 Muon Identification 



A muon behaves like a minimum ionizing particle due to its rest mass, which is about 200 
times larger than that of the electron [3]. Therefore muons deposit very little energy in 
the calorimeter systems. The outer layer of the CDF detector is instrumented by muon 
chambers. Various types of muon candidates are reconstructed that bear the name of the 
muon detector that records them. Ionization deposits from a muon candidate in a given 
muon detector constitute a "stub" . 
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The basic selection for a muon candidate is a well reconstructed high-pT track isolated 
from other activity in the detector that is matched to a muon stub. Other selection cuts 
required to refine the selection above are summarized in Table 14.31 for two types of tight 
muon candidates (CMUP and CMX). 

CMUP muon candidates are reconstructed in the region \r]\ < 0.6 and a track is required 
to match stubs in both CMU and CMP muon detectors. Some very energetic hadrons are 
able to deposit some of their energy outside the calorimeter systems, in the muon detectors, 
and therefore fake the muon signature. The "fake muon" fraction in the collected sample is 
decreased considerably by requiring both CMU and CMP stubs. 

CMX muon candidates are reconstructed in the region 0.65 < \r]\ < 1.0 and require a stub 
in the CMX muon detector. 

Table 4.3: Criteria for muon candidate (CMUP and CMX) identification. 



Criterion 


CMUP and CMX 


Pt 

Ehad 
Eem 
Isolation 
Track z 
Track pt 

COT Axial Segments 
COT Stereo Segments 
Impact Parameter do 

x 2 


> 20 GeV 

< 6 + max(0, (p - 100) • 0.0280) GeV 

< 2 + max(0, (p - 100) • 0.0115) GeV 

< 0.1 
< 60 cm 

> lOGeV/c 

> 3 

> 2 

< 0.2 cm (0.02 with silicon hits) 

< 2.3 


CMU Ax 
CMP Ax 
CMX Ax 
CMX pcot 


< 3 cm 

< 5 cm 

< 6 cm 
> 140 cm 



4.4.3 Isolated Track Identification 

Our original contribution to the WH analysis is the introduction of a new charged lepton 
reconstruction method based on a high-p^ track isolated from energy deposits in the track- 
ing systems and with no requirements about energy deposits in the calorimeter or muon 
detectors. We call these "isolated tracks" . The isolation requirement is necessary in order 
to ensure that the track corresponds to a charged lepton produced in a decay of a W boson 
and not part of a jet of hadrons that originate in quarks. The fact that the isolated track is 
not required to match a calorimeter cluster or a muon stub allows to recover real charged 
lcptons that arrive in non-instrumented regions of the calorimeter or muon detectors, as 
seen in Figure I7TT1 Thus, the signal acceptance of the WH search is increased and this has 
the potential to increase the WH search sensitivity. 
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The first analysis that used isolated tracks at CDF [53] was a top quark cross section 
measurement in top quark pair production where each top quark decays to one W boson 
and a b quark and where each W boson decays leptonically to a charged lepton and a 
corresponding neutrino, using an integrated luminosity of 1.1 fb _1 . In this analysis one 
charged lepton was either an electron or a muon candidate and the second charged lepton 
was an isolated track candidate. The events used in this analysis were selected using an 
electron-inclusive trigger or a muon-inclusivc trigger. 

In our analysis we have exactly one charged lepton candidate and in this channel the charged 
lepton is an isolated track candidate. However, we do not have an isolated track trigger at 
CDF. This is why we use three triggers with requirements on MET and jets, but not on 
charged lcptons. One of my original contributions consisted in parameterizing the trigger 
efficiency turnon curves and measuring the systematic uncertainty of this procedure. 

My work contributed directly to the neural network WH searches that used one (two) MET- 
based triggers in 2.7 fb _1 (4.3 fb _1 ), as described in the Ph.D. thesis [T] ([3]) and in their 
corresponding publications and CDF and Tevatron combinations. As my work evolved in 
time, this thesis presents for the first time the addition of a third MET-based trigger and a 
novel method to combine an unlimited number of triggers short of having an "OR" between 
triggers, as described in detail in Subsection 16. 1.41 

In order to reconstruct isolated tracks, we select on an event by event basis a set of good 
quality tracks with criteria that meet the requirements detailed in Table 14.41 



Table 4.4: Good quality tracks criteria. 



Variable 


Cut 


Pt 

AR (track, candidate) 
Azo (track, candidate) 
COT Axial Hits 
COT Stereo Hits 


> 0.5 GeV 

< 0.4 

< 5 cm 

> 20 

> 10 



Next, for each good quality track we define and compute a quantity called "track isolation" 
as 

Pt (track candidate) 

Track Isolation = — -, — — , (4.4) 

Pt (track candidate) + 2^ pt (other tracks) 

where pr (track candidate is the transverse momentum of the specific track we analyze (can- 
didate) and ^2pt (other tracks) is the sum of the transverse momenta of all good quality 
tracks within a cone radius of 0.4 of the candidate track. Given this definition, a track is 
fully isolated if it has a track isolation of 1.0. However, very seldom a track is fully isolated 
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in a hadronic collision. In this analysis we consider a track to be isolated if it has an iso- 
lation larger than 0.9, which means that at least 90% of the pr in the vicinity of the track 
corresponds to the track itself. 

From the sample of good quality tracks with isolation larger than 0.9 we select the sample 
of "isolated track" candidates by requiring further tighter track reconstruction criteria, as 
described Table 14.51 



Table 4.5: Criteria for isolated track candidates identification. 



Criterion 


ISOTRK 


p T 

T) 

zo 

Isolation 

COT Axial Hits 

COT Stereo Hits 

x 2 

Impact Parameter do 


> 20 GcV 

< 1.2 

< 60 cm 

> 0.9 

> 24 

> 20 

> io- 8 

< 0.2 cm (0.02 with silicon hits) 



The purity for isolated tracks is about 80%, whereas the purity for TIGHT charged lepton 
candidates is about 90%, as shown in Figure l8~Tl 

4.4.4 Charged Lepton Reconstruction Scale Factor 

Our detector is not as efficient to reconstruct charged lepton candidates in Monte-Carlo- 
simulated events as in real data events. For this reason we correct each simulated event by 
a scale factor for its specific charged lepton reconstructed category, which is defined as the 
ratio between the reconstruction efficiencies in data and simulated events. 

Scale Factor Measurement for Isolated Tracks 

We studied WH Monte Carlo simulated events and concluded that ISOTRK charged lepton 
candidates are muon candidates in 85% of cases, electron candidates in 7% of cases and tau 
lepton candidates in 8% of cases pQ. We measure the scale factor for muon ISOTRK charged 
leptons and correct the systematic uncertainty on that value for the fact that in 15% of the 
cases ISOTRK events are not muon candidates. 

We select a sample of events where a Z boson decays to a muon-antimuon pair and use the 
generic method called "tag and probe" , which is also used to measure the scale factor for the 
reconstruction of CMUP muons in Reference [64] . We select events with a well reconstructed 
tight muon (CMUP or CMX), which is considered the tag leg, and a high-pr track isolated 
from other activity in the detector, which is called the probe leg. We ask further selection 
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requirements to improve the purity of the Z boson sample: the invariant mass of the tag and 
prob legs should be in the Z boson mass window (81 — 101 GeV/c 2 ); the absolute value of 
the z position between the two candidates has to be smaller than 4 cm; the legs must have 
opposite electric charges; for data events, the tag charged lepton must fire the CMUP or 
the CMX muon trigger; the event must pass the cosmic veto to ensure it is not produced by 
cosmic ray^l; the probe charged lepton must havepy > 20 GeV/c and be matched to a muon 
stub. After these cuts, we have a very pure Z boson sample and we are confident that also 
the probe muon candidate is a real muon. We measure the efficiency that this probe muon 
candidate is indeed reconstructed as an ISOTRK charged lepton. We divide the efficiencies 
of data and Monte Carlo simulated event to obtain the ISOTRK reconstruction scale factor 
for each jet bin. For events with exactly two tight jets, as in our main analysis, we obtain 
the average value of 0.937 ± 0.009. Figures 14.11 14.21 and 14.31 show the simulated and data 
efficiencies and scale factor as a function of the ISOTRK </>, r\ and pt- 




Figure 4.1: Isolated track reconstruction efficiency and scale factor as a function of lepton r\. 
The upper plot shows the reconstruction efficiency in both data (black) and simulated (red) 
events, while the lower plot shows the resulting scale factor as the ratio of the histograms 
from the top plot. 



However, we do not quote a systematic uncertainty of 1%. We take into account that in 15% 
of cases the ISOTRK charged lepton is either an electron or a tau lepton candidate. We 
assign a 25% uncertainty for these cases p] [3]. The total ISOTRK scale factor systematic 

2 Cosmic rays that reach the CDF detector situated 10 meters underground and shielded with thick 
concrete walls are mostly muons. Such a muon from cosmic rays is typically very energetic and therefore its 
trajectory is not curved much by the solenoid magnetic field. Thus, it will produce an almost straight track 
in the tracking systems. On the other hand, the software reconstruction is looking for tracks starting from 
the centre of the detector and will reconstruct this one muon as two back-to-back muon candidates. 
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Figure 4.2: Isolated track reconstruction efficiency and scale factor as a function of lepton (f>. 
The upper plot shows the reconstruction efficiency in both data (black) and simulated (red) 
events, while the lower plot shows the resulting scale factor as the ratio of the histograms 
from the top plot. 



uncertainty is computed as a weighted average, namely 0.85 ■ 1% + 0.15 • 25% = 4.5%. 
Therefore, the ISOTRK scale factor for our analysis is 0.937 ± 0.042. 

The code for this procedure was written by a postdoctoral researcher (Nils Krumnack). For 
the past two years, I maintained the code after he left the collaboration and I used it to 
measure the ISOTRK scale factors for the WH analyses of 2009, 2010 and 2011. 



Scale Factor Measurement for Tight Charged Leptons 



We use a very similar method to measure the scale factor for the reconstruction of tight 
muon candidates, CMUP and CMX. For the dataset of 5.7 fb _1 used in this analysis, we 
measure for CMUP candidates a scale factor of 0.892 ± 0.002 and for CMX candidates a 
scale factor of 0.948 ± 0.002. 

A similar method with the exception that a Z boson sample decaying to electron pairs is 
selected is used to measure the scale factor for the CEM tight electron to be 0.977 ± 0.001. 
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Figure 4.3: Isolated track reconstruction efficiency and scale factor as a function of lepton px- 
The upper plot shows the reconstruction efficiency in both data (black) and simulated (red) 
events, while the lower plot shows the resulting scale factor as the ratio of the histograms 
from the top plot. 



4.5 Missing Transverse Energy Identification 

Neutrinos are the only subatomic particles that leave the detector completely undetected. 
Their momentum and energy appears to be missing. Since the longitudinal energies of the 
colliding partons are unknown and not necessarily equal, we can only say that the total 
transverse energy of the pp collision is zero. Therefore we observe a missing transverse 
energy (MET or ftr ) due to the neutrino. 

The vector missing transverse energy ($r) is the opposite of the vector sum of all energy 
deposits in calorimeter towers, with the z position of the neutrino closest to the beamlinc 
considered to be the z vertex position. The missing transverse energy is the absolute value 
of the vector missing transverse energy ($?t = |$r|)- From its definition we see that 
is formed by missing transverse energy from all undetected particles, such as one or more 
neutrinos, but also subatomic particles predicted by theories beyond the Standard Model, 
such as the lightest supersymmetric particle or a signature of particles travelling in extra 
spatial dimensions. In our analysis, real $t is produced by only one neutrino from the W 
boson decay. 

Real missing transverse energy can also be faked by mismeasurcd energy deposits in the 
calorimeter, especially due to jets. Typically jet energies are underestimated, as seen in the 
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next section, which translates into an overestimation of the $t- Events that produce jets in 
pure QCD processes tend to have fake|?T as we will see in the background chapter. 

At trigger level, I^t is computed using only calorimeter information and assuming the pri- 
mary interaction vertex is located in the centre of the detector. As shown in subsection l4.2.3I I 
performed a study in the context of the Higgs Trigger Task Force that showed that improv- 
ing the primary vertex position measurement at LI trigger level by noting in which side 
of the detector it is located does not improve significantly the offline z vertex position re- 
construction. This study confirmed this approach to be optimal at trigger level. However, 
the offline reconstructed is corrected for the true z vertex position, for the corrected jet 
energies, and by subtracting the momenta of minimum ionizing high-pr muons and adding 
back the transverse energy of the calorimeter towers crossed by the muon. These corrections 
can be summarized in the following equation: 

$ T =prf" - J2 PT+ Y1 S t(EM + HAD) - E T (Jct Energy Correction) (4.5) 

muon muon jet 

4.6 Jet Identification 

A jet consists of a collimated shower of secondary particles produced in the hadronization 
of a quark or gluon produced in the primary pp interaction. There are various possible 
algorithms to reconstruct jets from calorimeter towers. The algorithm used in this analysis 
is called JETCLU [65]. JETCLU is a cone algorithm that searches for towers with energy 
deposits within a cone of radius AR = -J (Ar/) 2 + (A0) 2 = 0.4 in the r\-<\> plane around the 
seed cluster. If a charged lepton is reconstructed within this cone, its energy is neglected in 
the calculation of the jet energy. 

A diagram of the jet production at CDF, passing from partem jet to particle jet and then to 
calorimeter jet, is shown in Figure FOl 

The energy of the jet is corrected for various effects |66) : 

• Relative Energy Corrections take into account the fact that the detector is not uniform 
in rj, since the plug and central calorimeters have different geometries and because there 
are cracks between calorimeter towers. Central calorimeters are better calibrated than 
the plug ones. This is why plug calorimeters are corrected with respect to the central 
calorimeters using Pythia Monte Carlo (MC) simulated events and di-jet data events. 
In our analysis, both data and MC events are corrected with respect to r/ to make sure 
there is a uniform jet energy response across the detector. 

• Multiple Interaction Corrections take into account the effect that typically there are 
more than one pp interactions in a bunch crossing. For an instantaneous luminosity on 
the order of 10 cm s the average number of primary interactions is three. This 
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Figure 4.4: Diagram of jet production at CDF, from parton-level jet to particle-level jet and 
then to calorimeter-level jet. Credit image to the CDF collaboration. 

is why CDF uses algorithms to select the correct primary interaction vertex. But also 
the energy of the jets is corrected for the effect that energy deposits from particles 
produced in pp interactions other than the one that interests us happen in the same 
calorimeter cluster as the one for the selected jet. These corrections are derived from 
minimum bias data events and are parametrized as a function of the number of primary 
vertices in the bunch crossing (event). 

• Absolute Energy Corrections take into account the effect of the non linearities and the 
uninstrumcnted regions of the detector. These corrections map the hadron-level jet 
after its hadronization from a quark or gluon to the calorimeter level-jet. 

• Underlying Event Corrections subtract the energy deposited in the calorimeter towers 
by the underlying event. The underlying event is represented by soft energy depositions 
due to the spectator quarks and gluons in the protons and antiprotons. 

• Out-of-Cone Corrections add the energy deposited in calorimeter towers that have not 
been reconstructed by the JETCLU algorithm to be part of the jet. 



The corrected transverse jet energy can be summarized by the following equation: 

pparton _ / ^measured v r t^MI v », \ v f P UE , P OOC (a r \ 

where the correction factors are f re \ (the scale factor that makes the jet energy measurement 
uniform for various rj, x iV v t x (the correction factor for multiple pp interactions per bunch 
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crossings), f a -^ s (the absolute energy correction determined by matching the parton energy 
to the jet energy), (the scale factor that corrects for the underlying event) and E® oc 
(the out-of-cone correction for the energy of the initial parton that is not reconstructed in 
the jet cone). 



The corrections and their systematic uncertainty can be seen in Figure |4"31 The transverse 
jet energy resolution |66j is given approximately by 




Pjjet(GeV) Corrected jet P T (GeV) 

Figure 4.5: The left hand side plot shows the absolute jet corrections for a cone size of 0.4 as 
a function of the calorimeter- level jet px- The right hand side plot represents the jet energy 
scale uncertainty due to detector calibration and simulation. The solid lines represent the 
total systematic uncertainty and the dotted lines the partial contributions. Credit image to 
the CDF collaboration. 



In our analysis we classify events by the number of tight jets. Tight jets are jets with the 
following tight selection criteria: Et > 20 GcV and |r/| < 2.0. Events can also have loose 
jets. Loose jets are jets with looser selection criteria and they are exclusive to the tight jets. 
Loose jets have 12 GcV < E T < 20 GcV and \r)\ < 2.0 or E T > 12 GcV and 2.0 < \r)\ < 2.4. 



4.7 6-jet Identification 



In this analysis we search for an associated production of a W boson and a Higgs bo- 
son, where the latter decays to a pair of bottom-antibottom quarks (bb pair). Each quark 
hadronizes and is seen in the detector as a jet. An essential point of our analysis is to 
identify for a given jet if it is produced by a b quark or not. After hadronization, b hadrons 
(mesons or baryons) travel for a relatively long lifetime, on the order of a few picoseconds, 
before decaying. Therefore, we can see in the silicon detectors a secondary vertex displaced 
by about three millimetres from the primary pp interaction vertex. The tracks originating 
in the secondary displaced vertex have on average larger values for the do parameter. 
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In this analysis we use two 6-tagging algorithms: Secondary Vertex Tagger (SecVtx) and 
Jet Probability Tagger (JetProb). 



4.7.1 Secondary Vertex Algorithm 



The Secondary Vertex algorithm (SecVtx) |67j reconstructs secondary vertices displaced 
with respect to the primary interaction vertex. SecVtx operates not on an event by event 
basis, but on a jet by jet basis, which means that one event can have two or more jets 
identified by SecVtx as originating from a b hadron, as shown in Figure [ 
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Figure 4.6: Diagram of a secondary vertex displaced with respect to a primary interaction 
vertex, typical for a jet originating from a b quark. Credit image to the CDF collaboration. 



SecVtx starts by looking at well reconstructed tracks by the Inside-Out tracking algorithm 
in the cone of radius 0.4 around the calorimeter cluster. That means that silicon tracks 
are required first and then a match to a COT track is required. In order to reject poorly 
reconstructed tracks, all tracks used by SecVtx must have px > 0.5GeV/c, their impact 
parameter do is corrected for the primary interaction vertex and meets the criterion of 
| do | < 0.3 cm, and the distance in the z coordinate between the track and the primary 
vertex should be less than 5 cm (|ztrack — z pr imary vertex I < 5 cm). In addition, all tracks have 
to pass a certain number of hits in the silicon detector and the COT and track fit have x 2 
criteria. 
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Once all the tracks inside the jet cone are reconstructed, SecVtx has a two pass approach. 

First, SecVtx tries to fit three loosely defined tracks to a common secondary vertex. These 
tracks have to have pr > 0.5 GeV/c, | \ > 2.5 and at least one of the tracks needs to have 
p T > l.OGeV/c. 

If no secondary vertex is found in the first pass, then in the second pass only two tracks are 
required, but they have to pass tighter selection criteria of pr > l.OGeV/c and |^-| > 3.0. 

If a secondary vertex is found by the first or second pass, then its two dimensional decay 
length is measured with respect to the primary vertex L xy , together with the uncertainty 
<7L xy ■ From these we obtain the decay length significance 5*L xy = J^ xy . L xy is positive 
(negative) when the tracks emerging the secondary vertex are heading in the same (opposite) 
direction as the jet. If |5l | > 7.5 the jet is SecVtx tagged, with a positive tag if Si, > 7.5 
and negative tag if Sx, xy < —7.5. 

A positive SecVtx tag means that the jet has been identified as originating from a b quark. 
A negative SecVtxtag means that the tracks have not been properly identified and that the 
primary vertex lies in front of the jet. The negatively tagged jets are unlikely to be produced 
by a & hadron and are used to estimate the percentage of positively tagged jets that actually 
originate from light quarks (such as u, d, s and g), also called the mistag rate. The mistag 
rate is parameterized as a function of various variables as seen in Figure [4~T1 



SecVtx Mistag Rate SecVtx Mistag Rate 




JetEt(GeV) Jet Eta 

Figure 4.7: Mistag rates of SecVtx as a function of jet Et (labelled here "Jet Et") and \rj\ 
(improperly labelled here "Jet Eta"). The rate is measured using inclusive jet data. Credit 
image to the CDF collaboration. 



The SecVtx algorithm is tuned to accept only a very low mistag rate, on the order of 1-2%, 
and this translates to a trade off in efficiency of only 40% as seen in Figure 14.81 What it 
means is that 60% of jets originating from a b quark will not be positively tagged by SecVtx. 

Also, out of the remaining positively tagged jets, about half originate from the c quark, 
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Figure 4.8: 6-tagging efficiency of SecVtx as a function of jet Et (labelled here "Jet Et"), 
jet (improperly labelled here "Jet Eta"), and number of primary interaction vertices. 
The operation point called "tight" is used for this analysis. Credit image to the CDF 
collaboration. 



which is also relatively massive and long lived. In fact, in general 6-taggers are actually 
heavy flavour taggers, where the flavour quarks are the 6 and c quarks. 

Furthermore, the efficiency for tagging a jet with SecVtx is different between data and Monte 
Carlo simulated events. We define a 6-tagging efficiency scale factor on a jet per jet basis 
for SecVtx defined as the ratio between the efficiencies for data and simulated events. We 
selected a jet sample enriched in jets originating in 6 quarks and we measure a 6-tagging 
scale factor of 0.96 ± 0.05. 

4.7.2 Jet Probability Algorithm 

Another jet by jet basis 6-tagging algorithm employed in this analysis is Jet Probability 
(JctProb) [68] [69]. JetProb looks at the distribution of impact parameters for the tracks 
reconstructed in the cone of the jet to estimate a probability that the ensemble of all these 
tracks is consistent with originating from the primary interaction vertex. 

The impact parameter of a track is considered positive (negative) if the angle between the 
track and the jet it belongs to is smaller (larger) than 90 degrees. More precisely, an impact 
parameter in the rj-(f> plane is positive (negative) if cos cf> > (cos0 < 0), where <f> is the 
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angle between the direction of the jet and the distance of closest approach between the track 
and the primary vertex, as seen in Figure 14.91 




Figure 4.9: Tracks from a primary vertex (left) and from a secondary vertex (right). Il- 
lustration of the track impact parameter definition that is used by the JetProb ^-tagging 
algorithm. Credit image to the CDF collaboration. 



Jets originating from light quarks (u, d, s) and gluons typically decay very close to the 
primary interaction vertex and their tracks should appear to emerge from the primary 
interaction vertex. However, due to finite tracking resolution, these reconstructed tracks 
have non zero impact parameter values, as seen in the top of Figure [4~9"1 A track has equal 
chances to have a positive or negative impact parameter. Therefore, the distribution of 
impact parameter values for tracks from light flavour jets is symmetric around zero, as seen 
in the top of Figure 14.101 




Signed Impact Parameter 



Figure 4.10: Tracks from primary vertex (left) and from a secondary vertex (right). Illus- 
tration of signed track impact parameter distribution that is used by the JetProb ^-tagging 
algorithm. Credit image to the CDF collaboration. 



However, jets originating from heavy quarks (c and b) will have tracks that emerge from a 
secondary displaced vertex, which translates to impact parameter values that are on average 
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positive and larger in absolute values, as seen in the right hand side of Figure 1431 Therefore, 
the distribution of impact parameter values for tracks from heavy flavour jets is elongated 
toward positive values, as seen in the bottom of Figure 14.101 

The JetProb algorithm uses only good quality tracks that have pt > 0.5GeV/c, | cZ 1 < 
0.1cm, more than 3 hits in the silicon detector, more than 20 COT axial segment hits and 
more than 17 COT stereo segment hits, as well as less than 5 cm in the z coordinate between 
the track and the primary interaction vertex. 



The first step is to quantify a probability for every track with a positive impact parameter 
that it originates from the primary interaction vertex (we note Pk the probability of the 
k th such track). Tracks with negative impact parameters are only used to quantify the 
uncertainty on the impact parameters, which depends on tracking detector resolution, beam 
spot size and multiple scattering. The impact parameter significance Sd is defined as the 
ratio of the impact parameter and its uncertainty 



The individual track probability is parametrized as a function of its impact parameter sig- 
nificance: 

r-IS^I R r S ^ dS 
ft (&,)= ,? ■ (4.9) 

Loo R ( S ) dS 

The next step is to quantify a probability on a jet by jet basis (Pjet) that the jet assumed 
to be made up of N well reconstructed tracks with positive impact parameter is consistent 
with originating from a primary interaction vertex (i.e. that the jet is originating from a 
light flavour quark or gluon): 



v-Efcf («o) 



where 



k=0 



N 



n = JJ Pk- (4.ii) 



k=l 

By the definition of Pj ot and all the things explained above, we can deduce that the distri- 
bution of Pj et is uniformly distributed between and 1 for light flavour jets and is peaking 
at for heavy flavour jets, as seen in Figure PL 111 In this analysis we ask for Pj ct < 0.05 
where the JetProb 6-tagging efficiency is approximately 33% and the JetProb 6-tagging scale 
factor is 0.78 ± 0.05. A mistag matrix is also measured for JetProb. 
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Figure 4.11: At left, Jet Probability 6-tagger distributions from Monte Carlo Simulated 
events for jets originating from b quarks (full red circles) , c quarks (empty blue circles) and 
light quarks or gluons (empty green squares). At right, Jet Probability &-tagger distributions 
from data events: inclusive electron data enriched in heavy flavour jets (full red circles) and 
generic QCD events selected with a trigger requiring jets with energy larger than 50 GeV 
(empty green squares) . Credit image to the CDF collaboration. 



4.8 Summary 



This chapter presented the reconstruction techniques for the physics objects used in this 
analysis. First the basic physics objects such as tracks, primary vertices and calorimeter 
clusters were introduced. Later, we described the reconstruction of high-level physics objects 
such as electrons, muons, isolated tracks and jet candidates, as well as missing transverse 
energy, which are used by the event selection. Finally, two algorithms used to identify jets 
originating from b quarks were described. We also present the scale factors for charged lepton 
identification and ^-tagging algorithms. All these physics objects are used to reconstruct all 
the event, background and data events for the WH search presented in this thesis. 

Since all signals and all but one background processes are simulated using Monte Carlo 
generators, in the next chapter we present how an event is generated in a Monte Carlo 
simulation and what are the particularities of simulation for each physics process. 
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Chapter 5 

Monte Carlo Simulated Events 



In this chapter we describe the Monte Carlo simulated events used as signal and background. 
In this analysis, the signal is represented by the associated production of a Standard Model 
Higgs boson and a W boson (WH process). We also consider a small contribution to the 
signal represented by the associated production of the Higgs boson and a Z boson, where 
the Z boson decays to a pair of charged leptons and one of the charged leptons is not 
reconstructed in the detector (ZH process). The following processes have identical or very 
similar final state signatures and therefore can mimic the signal to constitute background 
processes for this analysis: Vy+jets, top quark pair (tt), single top, Z+jets, diboson and non- 
W (QCD) production. We use Monte Carlo generators to simulate signal and background 
processes, except for the non-W (QCD) background, which is estimated using a data sample. 



5.1 Monte Carlo Simulation 



Simulation is needed to predict the distribution of signal and background events in the data 
event sample, as well as signal efficiencies and scale factors between data and Monte Carlo 
simulated events. 

A typical particle physics event has the following steps from initial pp interaction up to 
detection of final state particles in CDF: 

• Parton Distribution Functions. In a pp collision there is actually a parton-parton 
collision that takes place, whereas the other partons are spectator partons. Parton is 
a generic term used for a constituent of a proton or neutron and represents physically 
cither a quark or a gluorQ. A parton has a certain probability to carry a certain 
fraction of the total momentum of the proton or antiproton. This probability is called 
a parton distribution function. They are measured in particle physics experiments and 
then are used as inputs for theory or Monte Carlo simulated events. 

1 Historically speaking, first came evidence of a structure in protons and neutrons and only decades later 
came the confirmation of quark and gluon existence. 
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• As a proton and antiproton approach each other at momenta of 0.98 TcV/c, a shower 
of partons appears from each parton in the proton and antiproton. This is called 
generation of initial state partons. 

• As two of these partons collide, they transfer momentum to each other and they can 
even change flavour. This process is called the hard scattering event. 

• Just as the incoming partons were branching, the final state partons may also branch. 
This produces the final state partons. 

• Because the strong force described by QCD, which mediates the interaction between 
partons, does not allow the existence of neutral coloured particles, an outgoing parton 
will transform part of its energy to produce the mass of new partons that together form 
neutral hadrons travelling in the same direction as the initial parton. This process is 
called fragmentation. 

• Most hadrons produced are unstable and decay to other particles, thus producing the 
final state particles that deposit energy in the detector. 

In following subsections we will describe the various tools used for Monte Carlo simulated 
events, both for signal and background processes: event generators, parton shower and 
hadronization generators, detector simulation. 

5.1.1 Event Generators 

The signal Monte Carlo simulated events are generated with PYTHIA v6.2 [701, a general- 
purpose event generator. The hard parton scattering processes are computed at leading 
order matrix elements. PYTHIA uses the parton distribution functions (PDF) provided by 
CTEQ5L [71] . In this manner, a full particle physics event is generated, with parton shower 
and hadronization included. 

The W+jets and Z+jets background events are simulated using ALPGEN [72]. It is an event 
generator specialized in electroweak bosons (W and Z bosons) produced in association with 
a desired number of jets coming from either quarks or gluons. PYTHIA is also used to 
generate simulated events for the following background processes: diboson (WW, WZ and 
ZZ) production and top quark pair (ft) production. The single-top background events are 
simulated using MADEVENT [73] as event generator. As it produces events at parton level, 
it is sensitive to the top quark polarization that plays a role in the distribution of kinematic 
quantities in the events. The top quark mass is assumed to be 172.5 GeV/c 2 when modelling 
tt and single top production. The pure QCD (non-W) background events are not simulated 
at all. Instead, the contribution of this background process is measured directly from data 
events. 
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5.1.2 Parton Showering and Hadronization 

Irrespective of the event generator used, all simulated events use PYTHIA to model the 
parton showering, gluon radiation and then hadronization. The parton showering process 
allows for initial and final state gluon radiation. These gluons then decay to quark pairs and 
thus increase the number of jets detected in CDF. Beside the initial hard scattering events, 
more particles may be detected in CDF due to effects of multiple interactions and beam 
remnants. Once these particles are produced, they are all passed to the hadronization stage 
of the simulation. In the case of pp collisions at CDF, the hadronization takes place at small 
Q 2 and large a s and therefore perturbation theory calculations cannot be used. Instead, 
phcnomcnological models that depend on the Monte Carlo generator are employed. 

The table 15.11 summarizes the event generators used for different background and signal 
processes used in this analysis. 



Process 


Event Generator 


Parton Showering 


WH 


PYTHIA 


PYTHIA 


ZH 


PYTHIA 


PYTHIA 


IF+jets 


ALPGEN 


PYTHIA 


Z+jets 


ALPGEN 


PYTHIA 


Diboson 


PYTHIA 


PYTHIA 


tt 


PYTHIA 


PYTHIA 


Single top 


MADEVENT 


PYTHIA 


non-W (QCD) 


data 


data 



Table 5.1: Monte Carlo event generators and parton showering software programs used for 
Monte Carlo simulated events for signal and background processes used in this analysis. 

5.1.3 Detector Simulation 

Once the final state particles are generated, their propagation through the detector (their 
interaction with matter in the detector) is simulated using GEANT 3 [73] . Interaction with 
the silicon detectors is simulated using an unrestricted Landau distribution and a simple 
ionizing particle path length geometrical model. Interaction with the tracking chamber 
system is simulated using GARFIELD [75], whose parameters are tuned to match CDF 
COT data [46]. Interaction with the calorimeter system uses the GFLASH [76] package. No 
special paramctrization is used for interaction with the muon system. A detailed description 
of the CDF simulation can be seen in Reference [77] . 

5.1.4 Monte Carlo Validation 

Since the number of events due to background processes is estimated using Monte Carlo 
simulated events, it is essential to validate that the Monte Carlo simulation models the data 
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correctly. This is why we compare background estimation and data distributions for all the 
kinematic quantities used in this analysis, but also for other quantities as well. 



5.2 Signal Samples 



In this analysis the signal process is the associated production of a W boson and a Higgs 
boson, where a W boson decays leptonically to a charged lepton and a neutrino and the 
Higgs boson decays to a pair of bottom-antibottom quarks. The leading-order Fcynman 
diagram of the WH — > Ivbb process is presented in Figure [O] 



Figure 5.1: Leading order Feynman diagram for the WH associated production and decay, 
the signal process of our search. Credit image to the CDF collaboration. 



The signature of this process consists of a reconstructed charged lepton candidate (an elec- 
tron or a muon directly and a tau candidate indirectly through its decay to an electron or 
a muon), missing transverse energy (due to the fact that the neutrino escapes the detector 
without being detected) and two jets (due to the bottom and antibottom quarks). This 
final state signature is often called "lepton + jets", but it would be more correct to call it 
"charged lepton + missing transverse energy + jets" . 

Since the same signature is obtained also for the associated production of a Standard Model 
Higgs boson and a Z boson, where the Z boson decays to a pair of charged leptons and one 
of them is not reconstructed in the detector, we also consider this small contribution to the 
signal of the analysis (the ZH channel). 

Since the mass is unknown for the Standard Model Higgs boson, we have produced sev- 
eral WH and ZH samples assuming several values of Higgs boson masses that start at 
100GcV/c 2 , end at 150GcV/c 2 and increment in steps of 5GeV/c 2 . We have chosen this 
mass range because we are looking for a Higgs boson in the " low mass" region. The Monte 
Carlo event generator is PYTHIA, which treats the Higgs boson as a resonance. The pro- 



antiproton 



proton 




b 
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duction cross section and decay branching fraction depend on the assumed Higgs boson mass 
and include all the latest higher order QCD and electroweak corrections [78], as illustrated 
in Table [5J2] 



M(H) 




<r(pp — > ZH) 


BR(H -> bb) 


100 GcV/c 2 
105 GcV/c 2 
110 GcV/c 2 
115 GcV/c 2 
120 GeV/c 2 
125 GeV/c 2 
130 GeV/c 2 
135 GeV/c 2 
140 GeV/c 2 
145 GeV/c 2 
150 GeV/c 2 


0.2919 pb 


0.1698 pb 


0.8033 


0.2484 pb 


0.1459 pb 


0.7857 


0.2120 pb 


0.1257 pb 


0.7590 


0.1819 pb 


0.1089 pb 


0.7195 


0.1564 pb 


0.0944 pb 


0.6649 


0.1351 pb 


0.0823 pb 


0.5948 


0.1169 pb 


0.0719 pb 


0.5118 


0.1015 pb 


0.0630 pb 


0.4215 


0.0883 pb 


0.0553 pb 


0.3304 


0.0770 pb 


0.0487 pb 


0.2445 


0.0673 pb 


0.0429 pb 


0.1671 



Table 5.2: Cross section values for the WH and ZH production in proton-antiproton col- 
lisions at the centre-of-mass of y/s = 1.98 TeV at the Tevatron accelerator, in units of 
picobarns (pb), as well as branching ratio of the Higgs boson decay to bottom quark pairs, 
as a function of the Higgs boson mass [78]. 



The Monte Carlo samples model a different number of primary interaction vertices per event 
in order to simulate the relatively low, medium and high instantaneous luminosity that the 
Tevatron accelerator provided during its current Run II. 



5.3 Background Samples 



In general there are two types of background: reducible and irreducible. The reducible 
backgrounds can be reduced given enough data sets and/or improved analysis techniques. 
They typically have different signatures than the signal processes or they have the same 
signature but are made up of one or more objects incorrectly reconstructed (fake objects j 2 ) 
or by one particle not being reconstructed at all. The irreducible backgrounds have the 
same final states and therefore, even if all high level objects are correctly reconstructed, 
those backgrounds will not go away. All one can do is measure them correctly. 

In this analysis we use data or Monte Carlo simulated events to measure the contribution 
of the background processes to our sample. Each of them will be described in more detail 
below. 

2 As examples of fake object reconstruction we quote a real electron being reconstructed as a jet and vice 
versa, or mismeasured jet. photon and electron energies that appear to produce missing transverse energy 
in the event. 
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5.3.1 Top Quark Pair Production 



The first reducible background is due to top quark pair production (tt). A top quark decays 
almost 100% of the time to a W boson and a bottom quark. For this analysis, the background 
process happens when one W boson decays leptonically and some of the second W boson 
decay daughter particles products are not reconstructed properly Since in principle better 
detector and analysis techniques can reduce the fraction of events reconstructed incorrectly 
this background is reducible. The leading order Feynman diagrams for the tt production 
and decay are illustrated in Figure [5T21 where the process mimics our WH signal if one of 
the charged lcptons (process in the left diagram), or two of the jets (process in the right 
diagram) are not reconstructed. 



(a) (b) 




Figure 5.2: Leading order Feynman diagrams for the top quark pair production and decay 
[79] . reducible background processes for our WH search, in the case the reconstruction 
misses a charged lepton in the process on the left or two jets in the process on the right. 



5.3.2 Z Boson + Jets Production 



The second reducible background is the associated production of a Z boson and a gluon, 
where the Z boson decays to two charged leptons (and one is not reconstructed at all) and 
the gluon decays to a 66 pair (and the mismeasured jet energies produce a fake missing 
transverse energy). The leading order Feynman diagram for this process is shown in the 
Figure O (a). 

5.3.3 Non-W (QCD) Multi-jet Production 

The third reducible background of this analysis is the multijet production described by the 
QCD theory, where one jet fakes an electron candidate and the mismeasured energy of all the 
jets fakes the missing transverse energy. Since semi-leptonic decays of hadrons containing 6 
or c quarks produce muons, a jet could in principle also fake a muon candidate. However, 
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we reduce most of these cases by requiring the isolation requirement on the charged lepton 
candidates. Thus a fake W boson is reconstructed in the event, whereas the physical process 
contains none, as seen in the leading order of the non-W (QCD) background is illustrated 
in the Figure [5751 (b). It is the only background for which the event yield and kinematic 
shapes are estimated using a data sample, and not one of Monte Carlo simulated events. 
As we analyze an increasingly larger integrated luminosity, we gain a better understanding 
of the process where jets fake real electrons and the jet energy resolution. This is why the 
non-W (QCD) is a reducible background. 



Figure 5.3: Leading order Feynman diagrams for the Z+jets reducible and non-W (QCD) 
processes production and decay, reducible backgrounds for the WH search [79] . The Z+jets 
process can mimic our signal if a charged lepton is not reconstructed and jet energies are 
mismeasured. The non-W (QCD) jet production can mimic our signal if one jet is incorrectly 
reconstructed as a charged lepton and if jet energies are mismeasured. 



5.3.4 W Boson + Jets Production 

There are several W^+jets processes. Just as in the case of the signal, these processes present 
a real W bosons and two jets. The jets may originate from light flavour partons (up, down, 
strange quarks and any type of gluons) or from heavy flavour partons (charm and bottom 
quarks). For the signal process, both jets originate from bottom quarks. After we employ 
algorithms to identify jets that originate from bottom quarks (^-tagging algorithms), the 
signal over background ratio increases. 

However, these algorithms are not perfect. On one side, light flavour jets may be incorrectly 
identified as heavy flavour jets and thus become a background for our signal, which we 
denote W + Light Flavour jets, or shortly, W+LF or Mistags. Such processes are shown 
in Figure [5741 (c) where two jets are produced from two gluons and also in Figure [574] (a) if 
we replace the two bottom quarks with two light quarks. On another side, these algorithms 
also identify charm quarks as bottom quarks. In other words, although they are denoted 
^-tagging algorithms, in reality they tag heavy flavour quarks, with about half tagged jets 
originating either from a bottom or charm quark. Such a process is presented in Figure 15.41 
(b) and is called W + cj. The process of Figure I5"74l (a) where we replace the two bottom 



(a) 



(b) 
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quarks with two charm quarks is called Wcc. In this thesis we add presented these processed 
together under the common name of Wcc. 

Since the 6-tagging algorithms may be improved if more time is available and calibrated 
better in Monte Carlo and data simulated events as we collected a data sample that cor- 
responds to a larger integrated luminosity, these W+LF and Wcc processes are reducible 
backgrounds for our WH signal. 

The first irreducible background is the associated production of a W boson and two jets 
that originate from bottom quarks, as seen in 15.41 (a). It is called irreducible since even if 
we had perfect reconstruction algorithms, the process has exactly the same final state as 
our signal. In order to reduce an irreducible background event prediction, one has to look 
at subtle kinematic differences between the two processes, as just reconstructing correctly 
the final state is not enough any more. We call this process Wbb background. We also use 
the denomination of W + Heavy Flavour, W+HF, for the sum of Wcc and Wbb. 

(a) (b) (c) 




Figure 5.4: Leading order Feynman diagrams for several W + jets associated production 
and decay processes [79]. Diagram (a) represents the irreducible Wbb background, diagram 
(b) represents the reducible Wcc background and diagram (c) represents the reducible Wlf 
background. 



For each process, the W boson decay is produced lcptonically to cither electron, muon or 
tau leptons plus neutrinos. Also, each of these processes is simulated with various numbers 
of extra generic partons (W+HF, namely W + bb + Op, W + bb + lp,W + bb + 2p). All the 
subsamples have to be added in order to obtain the generic W+HF background event yield. 
Some extra jets can be produced by PYTHIA to account for the initial state radiation (ISR) 
and final state radiation (FSR). In order not to count incorrectly the number of jets in these 
events, we use the MLM prescription [72] . 

Each of these samples is simulated using ALPGEN for matrix element generation and 
PYTHIA for parton showering. After a jet-based flavour overlap removal algorithm [80] 
is applied to both W+LF and W+HF samples, they are all added together, weighting each 
sample by its production cross section. 
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5.3.5 Single Top Quark Production 

The second irreducible background process for this analysis is the single top production, 
where a top quark is produced by the electrowcak force in association with a bottom quark 
(s-channel, which has the exact final state as our WH signal and therefore is an irreducible 
background) and in association with a bottom quark and a generic quark, where the generic 
quark escapes detection (t-channel, in principle a reducible background due to the presence of 
the generic jet). The s-channel (t-channel) leading order Feynman diagrams are illustrated in 
the left (right) side of Figure 15.51 The single top production has been observed (discovered) 
experimentally at CDF [79] . 

(a) (b) 




Figure 5.5: Feynman diagrams for the single top associated production and decay [79] . 
background processes for our WH search. The the s-channel (a) is an irreducible background 
because of the presence of two bottom quarks in the final state, while the t-channel (b) is 
in principle a reducible background due to the presence of a generic get in the final state. 



Single top events are generated with MADEVENT and the parton showering is done with 
PYTHIA. 

5.3.6 Diboson Production 

The electroweak diboson production processes are the associated production of two W 
bosons (WW), two Z bosons (ZZ) or the associated production of a W boson and a Z 
boson (WZ). The WW and W Z processes constitute an irreducible background when one 
W boson decays leptonically and the second W boson or the Z boson decay hadronically to 
two quarks. The ZZ process constitutes a reducible background when one Z boson decays 
leptonically to a pair of charged leptons, one of which is not reconstructed at all, and the 
second Z boson decays hadronically. For these processes, both the matrix elements and the 
parton showering are simulated using PYTHIA. The Feynman diagrams of these processes 
can be seen in Figure 15.61 The non- resonant production of diboson processes predicted by 
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the Standard Model have been observed at the Tevatron [ST] [52] [55] . 

(a) (b) (c) 




Figure 5.6: Feynman diagrams for the electroweak diboson production and decay |79| . The 
WW (a) and WZ (b) are irreducible backgrounds, whereas ZZ (c) is a reducible background 
processes for our WH search. 



5.4 Summary 



We started this chapter by describing a three-step methodology to simulate pp collisions 
that produce elementary particles that are recorded by the CDF detector. The first step is 
to model the pp interaction using Monte Carlo event generators, such as PYTHIA, ALP- 
GEN and MADEVENT. If quarks are produced, they hadronize and form a shower, which is 
modelled by PYTHIA. Finally, the propagation of such showers and other particles through 
the CDF detector is modelled using GEANT3. 

In the second part of the chapter, we described the signal processes for the WH search. 
The main signal process is the WH associated production, where the W boson decays 
leptonically and the Higgs boson decays to a bb pair. The second signal process is the ZH 
associated production, where the Higgs boson decays to a bb pair and the Z boson decays to 
two charged leptons, but one of them is not reconstructed in the CDF detector. The second 
process is only a small contribution to the main WH process. We presented the Feynman 
diagrams for the signal processes, as well as their cross sections and branching ratios for the 
Higgs boson masses studies in this analysis. 

We continued by presenting all the main background processes to our WH search. We 
divided the backgrounds into two main categories. The reducible backgrounds (top-quark- 
pair production, Z-boson-plus-jets production, Non-VF-QCD-multi-jet production) are those 
processes that can be separated from the signal in principle, with datasets with large inte- 
grated luminosity and with better analysis techniques. The irreducible background processes 
(W-boson-plus-jets production, diboson production, single-top-quark production) are those 
processes that cannot be separated further from the signal even with larger datasets and 
improved analysis techniques because they contain the same final state as the signal. We 
also presented the Feynman diagrams for the background processes. 
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In the next chapter we will present the event selection, both online (at the trigger level) and 
offline (at analysis level) to select from the many pp collisions stored by the CDF those that 
have the signature presented in this chapter, namely one charged lepton, missing transverse 
energy and two jets that originate from bottom quarks. 
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Chapter 6 

Event Selection 



In this analysis we select candidate events consistent with the signature of one charged lepton 
(electron or muon), missing transverse energy and two tight jets (at least one required to 
be 6-tagged by the SecVtx algorithm). We analyze a data sample that corresponds to an 
integrated luminosity of 5.7 ± 0.3 fb _1 . 

In this chapter we will describe the online and offline event selection. The online selection 
is performed by the three levels of the trigger system that selects, every second, about 100 
bunch crossings out of the about 2 million bunch crossings that take place per second. The 
total information recorded by the CDF detector during a bunch crossing is called an event. 
These selected events are saved on magnetic tape and later are analyzed in detail in the 
offline analysis. 

6.1 Online Event Selection 

Events for the tight lepton categories are selected using the charged lepton information. 
Events with CEM charged leptons are required to pass the high transverse momentum 
CEM trigger. Events with CMUP charged leptons arc required to pass the trigger that asks 
for ionization both in the CMU and CMP detector. Events with CMX charged leptons have 
to pass the trigger requirements of CMX ionization deposits. 

One of our contributions to this analysis is the addition of a loose charged lepton category 
that uses at trigger level the orthogonal information to the charged lepton, namely the 
missing transverse energy and jets. We use three missing transverse energy + jets triggers 
to select isolated track events, thereby increasing the signal acceptance. 

Each trigger has three levels of selections, each more stringent than the previous. The details 
of selection for all these triggers at all the trigger levels are presented in detail below. 
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6.1.1 CEM Trigger 



The trigger used for selecting tight central electron candidate events (CEM) is called "ELEC- 
TRON_CENTRAL_18" . At LI, it requires a track with p T > 8GcV/c, a calorimeter tower 
with Et > 8 GeV and the ratio between the energy deposited in the hadronic calorimeter to 
that in the electromagnetic calorimeter Ehad/Eem < 0.125. At L2, it requires a calorime- 
ter cluster with Et > 16 GeV matched to a track of pr > 8GeV/c. At L3, it requires an 
electron candidate with Et > 18 GeV matched to a track of pt > 9GeV/c. We note that 
as we move up the trigger levels, object reconstruction becomes more sophisticated and the 
selection cuts arc tightci0. This is a general feature of triggers. 

We use a data sample of W boson events decaying leptonically to an electron and a neutrino 
to measure the efficiency of this trigger. The efficiency of a trigger is the percentage of signal 
Monte Carlo simulated events that meet all of the trigger requirements. We find an average 
efficiency of 0.961 ± 0.001 for the CEM trigger. 

6.1.2 CMUP Trigger 

The trigger used for selecting tight central muon candidate events (CMUP) has the name 
of "MUON_CMUP18". At LI, it requires a track with p T > 4GeV/c that is matched to 
ionization in both the CMU and CMP muon chamber that is consistent a muon candidate 
with pt > 6 GeV/c. At L2, it requires a track with pt > 15 GcV/c and that the calorimeter 
cluster in the direction of the track and muon stubs have ionization deposits consistent 
with a minimum ionizing particle. At L3, it requires a fully reconstructed COT track with 
Pt > 18 GeV/c that, if extrapolated to the CMU (CMP) detector, matches hits in this 
detector within Axqmu < 10 cm (Axcmp < 20 cm). 

We measure the efficiency of the CMUP trigger by collecting a data sample of Z bosons 
that decay to a muon-antimuon pair, where one muon passes the trigger requirements and 
the second muon is checked if it passes the trigger selection or not. We find a CMUP trigger 
efficiency of 0.877 ± 0.002. 

6.1.3 CMX Trigger 

The trigger used for selecting another category of tight central muon candidate events, more 
forward than the CMUP muons, is called "MUON_CMX18_DPS" . At LI, it requires a track 
with pt > 8GeV/c that is matched to ionization in the CMX muon chamber. At L2 and 
L3, the criteria are identical to that of CMUP trigger, only that they refer to the CMX 
detector. 

x We see that only towers and tracks arc required at LI, but clusters reconstructed around towers and 
matching between clusters and tracks arc asked for at L2 and full electron reconstruction is performed at 
L3. We also note that the cut values increase as we move from LI to L3. 
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In a similar way to CMUP trigger, we measure a trigger efficiency of 0.902 ± 0.002 for the 
CMX trigger. 

6.1.4 Triggers for ISOTRK 

Our main original contribution to this analysis is the use of three missing transverse energy 
+ jets triggers that are used to select ISOTRK events by using information orthogonal to the 
charged lepton ones, namely the missing transverse energy and the jets. In order to use these 
three MET-based triggers, we had to parameterize the trigger efficiency turnon curve as a 
function of trigger MET for each of the triggers and each trigger level. We also introduced 
a novel method to combine the three different triggers in order to maximize the event yield 
and yet do not have a logical "OR" between the triggers in order to avoid correlations and 
measure easier and correctly the systematic uncertainty related to the combined trigger 
efficiency. Due to the long description necessary for this work, we present it in detail in 
Appendix [A] and Appendix [B] 

6.2 Offline Event Selection 

In this section we will describe the baseline offline event selection, followed by the description 
of the 6-tagging categories used and the QCD veto used for every charged lepton category. 

6.2.1 Baseline Event Selection 

The offline event selection makes sure selected events pass a series of criteria that help 
discriminate the WH signal against a large background. 

• Events must fire the trigger specific to their charged lepton category. There is a special 
treatment for ISOTRK channel which, constitutes one of our original contributions to 
this analysis. 

• The z coordinate of the primary interaction vertex must be within 60 cm of the centre 
of the detector, the so-called fiducial region. 

• One and only one high transverse momentum isolated charged lepton candidate must 
be reconstructed in the event. We require muon (electron) candidates to have pr > 
20 GeV/c (E T > 20 GeV). We note that the trigger requirement was of p T > 20 GeV/c 
{{Et > 20 GeV), but we require higher values in the offline selection to make sure that 
the event is in the plateau region of an possible trigger turnon. 

• The z coordinate difference between the charged lepton track and the primary inter- 
action vertex has to be smaller than 5 cm (|z trac k — zo| < 5 cm). 
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• Photon conversion events are vetoed. High energy electrons can emit photons pri- 
marily as they pass through the detector due to bremsstrahlung radiation. As these 
photons are also very energetic, they decay to electron-positron pairs. Events with 
such secondary electrons are identified and removed thanks to their two tracks with 
a small opening angle emerging from a vertex far away from the primary interaction 
vertex. 

• Cosmic ray events are vetoed, as discussed in the muon identification subscction l4.4.2l 

• Events are vetoed if the invariant mass between the reconstructed charged lepton and 
any other track with opposite charge in the event falls in the Z boson mass window 
(76 - 106GeV/c 2 ). 

• The event must present large missing transverse energy fir > 20 GeV for all charged 
lepton categories. The missing transverse energy is corrected for the position of the 
primary interaction vertex, for jet energies and the presence of muons, as described in 
detail in the object reconstruction chapter. 

• The event must contain exactly two tight jets reconstructed using a JETCLU algorithm 
with a radius AR = 0.4. Each jet is required to have Et > 20 GeV and be in the 
central region of the detector (|?y| < 2.0) q 

• Each event is required to pass some selection criteria designed to reject a significant 
part of the non-W (QCD) background events (QCD veto). The QCD veto is specific 
to every charged lepton category and will be discussed below. 

• Each event must have at least one jet that is 6-tagged using SecVtx algorithm in order 
to discriminate against events with light flavour jets. The detailed tag categories are 
descried below. 

6.2.2 fe-tagging Categories 



Identifying jets that originate in b quarks is essential for this analysis, helping discriminate 
against the W+LF background. Since for our WH signal events both jets originate from a 
b quark, we ask all events to have at least one jet tagged by SecVtx. Since SecVtx is more 
efficient than JetProb and we want to maximize the number of events that have two tagged 
jets, we use the following 6-tagging categories: 

• SecVtx tight + SecVtx tight (SVTSVT): Both jets in the event are tagged with 
SccVtxat the tight operating point. 

• SecVtx tight + JetProb 5% (SVTJP05): One jet in the event is tagged by SccVtxat 
the tight operating point and another one is not tagged by SecVtx, but is tagged by 
JetProb algorithm at 5% operating point. A jet is considered 6-tagged by JetProb if 
it has a probability of less than 5% to emerge from a primary interaction vertex. 

2 There arc two other WH analyses at CDF 1171 |84l that studied in addition the sample of events with 
exactly three tight jets. The third jet may appear from the underlying event, the initial state radiation or 
the final state radiation. 
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• SecVtx tight (SVTnoJP05): One jet in the event is tagged by SecVtx at the tight 
operating point and another one is neither tagged by SecVtx, nor by JetProb. 

By the construction of the 6-tagging categories, we can see that they are orthogonal. Our 
analysis is divided in several channels based on the charged lepton category and the 6-tagging 
categories. 

6.2.3 QCD Veto 

In order to evaluate this background event yield, we select events based on a non-W model 
specific for each charged lepton category in the analysis. These selection criteria are very 
close to the charged lepton ones, with one or two cuts required to fail. Therefore samples 
of enriched non-W events are selected, which give the shape for the non-W background 
for different kinematic quantities. The normalization of these shapes (the exact yield of 
background events) is calculated later on using a fit in data for missing transverse energy 
distributions, as we will see in detail in the background estimation chapter. 

For the tight central electron (CEM) channel of the analysis, the non-W model is called 
"anti-electron" and it tries to model the jets that are incorrectly reconstructed as an electron 
candidate. The electron trigger is required to have fired as well as passing all but two of the 
following kinematic quantities: Ehad/Eem, X 2 > L s h r , Q • Ax and \Az\. 

For the tight forward electron (PHX) channel of the analysis, the non-W model is called 
"jet-electron". It selects jet candidate events with a transverse energy Et > 20 GeV, 0.80 < 
Ehad /Eem < 0.95 and at least four tracks in the jet. This makes sure that no real electrons 
are selected in this sample. 

For the central muon (CMUP and CMX) channels of the analysis, the non-W model is called 
"non-isolated muons" . It selects muon candidates that pass all the selection criteria but fail 
the "isolation" one. That means that the charged lepton track is required to be surrounded 
by other activity in the COT, which is typical for jets. 

The same criteria are required for the central loose muon candidates (ISOTRK) channel of 
this analysis, where the non-W model is called "non-isolated loose muons" . We select loose 
muon candidates that pass all the selection criteria but fail the "isolation" cut. 

In order to reduce the non-W (QCD) background in data samples before asking any 6-tag 
requirement (Pretag samples) and in the SVTnoJP05 samples, we apply further selection 
cuts that are generically called "QCD veto". The QCD veto will not be applied for the 
SVTSVT and SVTJP05 tagging categories. The QCD veto applied is specific for each 
charged lepton category. 

We first define mt^ as the reconstructed transverse mass of the W boson: 
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m T =yj2- (p T \ ep -$r -P T ,Tep ' $t)i (6.1) 

We then define <j>m Tt j e t2 as the azimuthal ((f) angle between the missing transverse energy 
vector (EfT ) and the second most energetic jet in the event (also called the second- leading jet) 
and METgjg as the missing transverse energy significance given by the following formula: 



MET sig = 

\J Ejets C jes c°s 2 ( A 0e T jeti)ET + cos 2 (&<f) vtx$T ^ orr$T ){$T - Ejets e t) 

(6.2) 

where Cjes is the jet energy correction factor and &4>vtx$ T ,corr$ T is the azimuthal angle 
difference between the uncorrected and corrected missing transverse energy directions. 

For the CEM category we require that: 

- > 20 GeV 

- MET sig < -0.05 • + 3.5 

- MET^g < 2.5 — 3.125 • A(f>g T j et2 

For the CMUP, CMX and ISOTRKcategories we require that: 

- > 20 GeV. 

6.2.4 Further Event Selection for ISOTRK 
Ensuring Exactly One Charged Lepton 



For ISOTRK category we require a high transverse momentum isolated track in the central 
region of the detector (\r]\ < 1.2), as it was described in Subsection l4.4.3l Since this is a loose 
charged lepton category, we apply further cuts to ensure that the isolated track is genuinely 
a charged lepton produced in a W boson decay and that the event is not counted already 
in another charged lepton category, since the signature of our analysis contains exactly one 
charged lepton. Therefore we apply the following veto cuts: 

• Tight Lepton Veto: If the event has a reconstructed CEM, CMUP or CMX lepton 
candidate, that event can not be an ISOTRK event. 

• Tight Jet Veto: If an isolated track is within a cone radius AR = 0.04 of a tight 
jet, then most likely the track belongs to a particle produced in a quark decay in a 
secondary vertex. We reject the event to avoid that the track does not originate from 
the primary interaction vertex. 

• Two or More Isolated Track Veto: If two or more isolated tracks have been recon- 
structed in the event, we veto the event. 
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Ensuring Trigger-Specific Jet Requirements 



Also, the jet selection is specific for each of the three MET-based triggers. As described 
in Appendix [B] in my new method of combining three MET-based triggers, we compute 
on an event by event basis the trigger efficiency for each trigger and then we set it to zero 
if the event fails the jet selection specific to that trigger. Then we choose the largest of 
the weights and we require that it is strictly larger than zero, thus ensuring that the jet 
selection is passed for at least one of the triggers. We weight simulated events by this final 
weight. For data events, we check that the trigger that gave the largest weight fires for the 
particular event. If it does, the event is kept. If it does not, the event is rejected, even if 
it could fire other triggers. We do not even check if other triggers fire and it is this specific 
handling of triggers that allows us to select the maximum signal acceptance without having 
correlations (OR) between the triggers. 



6.3 Summary 



In this chapter we have presented the online (trigger) and offline event selection. We started 
by describing in detail the charged-lcpton inclusive triggers used in this analysis, namely 
the CEM, CMUP and CMX triggers. Our original contribution is the isolated track charged 
lepton category, for which we introduce a novel method to combine three different MET-plus- 
jets-based triggers. We continued by describing the baseline event selection for all charged 
lepton categories. We then introduced the three different 6-tagging categories employed in 
this analysis with the help of two different 6-tagging algorithms. We then presented the 
selection used to remove a large fraction of the Non-W (QCD) background, which is specific 
to each charged lepton category. The novel charged lepton category we introduced needs 
further specific event selection, such as vetoing events with two or more charged leptons or 
isolated tracks and ensuring the event passes a jet selection specific to the chosen MET-based 
trigger. 

The event selection criteria described in this chapter are applied both to data and Monte- 
Carlo-simulated events. In the next chapter we will present the methodology and the result 
for the calculation of signal event yield prediction. In the following chapter we will present 
the background estimation method. 
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Chapter 7 

Higgs Boson Signal Estimation 



In this chapter we use Monte Carlo calculations to estimate the number of WH (ZH) signal 
events in our sample and the systematic uncertainty on this number. 



7.1 Signal Prediction Estimation 



We calculate the number of predicted signal events from by the main signal channel N WH ^ lyb i 
using Equation 17.11 The supplementary contribution from TV zH->i{i)bt> i s calculated using 
Equation O 



' e W(Z)H^lvbb ■ 

(7.1) 



' € W(Z)H^lubb ■ 

(7.2) 



In these equations, u{pp — ¥ W(Z)H) represents the W(Z)H production cross section in pp 
collisions at a centre-of-mass energy of y 7 ! = 1.98 TeV. These values are a function of the 
Higgs boson mass and are presented in TableO The Ej= e , M ,r BR(M / -> Iv) is 0.324± 0.003 
0] and represents the branching ratio of a W boson leptonic decay and is equal to the sum 
of the branching ratio of a W boson decay to an electron and an electron neutrino plus the 
branching ratio of a W boson decay to a muon and a muon neutrino plus the branching 
ratio of a W boson decay to a tau lepton and a tau neutrino. The X^= e a r BR(Z — > 11) is 
0.104 ± 0.001 [1] and represents the branching ratio of a Z boson decay charged leptons and 
is equal to the sum of the branching ratio of a Z boson decay to an electron-antielectron pair 
plus the branching ratio of a Z boson decay to a muon-antimuon pair plus the branching 
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ratio of a Z boson decay to a tau-antitau pair. The BR(H —> bb)) is the branching ratio of 
the Higgs boson decay to a bb pair. These values are a function of the Higgs boson mass 
and are presented in Table 15.21 The J Cdt is the integrated luminosity. For this analysis, 
it has a value of 5.70 ± 0.34 fb -1 . The £w(Z)H->ivbl is the efficiency of the signal selection 
and is given by equation 

— MC In o\ 

e W(Z)H^lvbb ~ e zo ' trigger ' CleptonlD ' e WH^lvbb ' V - 6 ) 

In Equation 17.31 e 20 is the efficiency of the cut that requires the primary vertex position to 
be situated within 60 cm of the centre of the detector (|z | < 60 cm). The trigger is the 
efficiency of the requirement that the event fires the required trigger. It is measured in data 
for each trigger and therefore is specific to each charged lepton category. The ei ep toniD is the 
ratio of the lepton identification efficiencies for data and for simulated signal events, also 
called lepton identification scale factor. The e„,„ , the fraction of signal events, after 

r WH^rlvbb ° ' 

the requirement of \zq\ < 60 cm, that pass all the other kinematic selections of the analysis. 
For the various 6-tagging categories, this term also takes into account the 6-tagging scale 
factor between data and signal simulated events. 

For a Higgs boson mass of 115 GeV/c 2 , the computed WH and ZH signal event predictions 
for each charged lepton and 6-tagging category are presented in Table 17.11 My original 
contribution to this analysis, the ISOTRK charged lepton category, increases the WH {ZH) 
signal prediction by 33% (66%) over the TIGHT charged lepton category alone. 

This increase is easily understood from Figure FTTl which represents the r\-<$> distribution of 
a sample of Monte Carlo simulated events for the WH signal, where the Higgs must have 
a mass of 115 GeV/c 2 , after the full event selection in the Pretag sample. My original 
contribution to this analysis, the addition of ISOTRK charged lepton candidates, fills the 
gaps, as we see in red. This increases the number of selected events containing charged 
leptons and thus the signal event prediction. Unlike the case of muon candidates, the 
distribution for electron candidates is smooth. This is why ISOTRK candidates are mostly 
muon candidates. The calorimeter detectors still have very small non instrumented regions, 
such as between the wedges of the calorimeter towers. A detailed study [1] identified that 
ISOTRK candidates are muon candidates in 85% of cases, electron candidates in 6% of cases 
and tau candidates in 7% of cases. 

7.2 Systematic Uncertainty on Signal Event Prediction 

In this section we describe the various contributions to the systematic uncertainty we quote 
on the WH and ZH signal event prediction. 
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CDF Run II Preliminary 5.7 fkr 1 
Number of Expected WH (ZH) Events at M(H) = 115 GeV 


Tag Sample 


CEM 


CMUP 


CMX 


ISOTRK 


% Increase 


Prctag 


12.07 (0.27) 


6.47 (0.48) 


3.20 (0.23) 


7.34 (0.64) 


34 (65) 


SVTSVT 


1.72 (0.04) 


0.86 (0.06) 


0.44 (0.03) 


1.01 (0.09) 


33 (69) 


SVTJP05 


1.24 (0.03) 


0.64 (0.04) 


0.32 (0.02) 


0.74 (0.06) 


34 (67) 


SVTnoJP05 


4.17 (0.09) 


2.23 (0.16) 


1.12 (0.08) 


2.46 (0.22) 


33 (67) 



Table 7.1: Expected number of WH and ZH signal events for an assumed Higgs boson 
mass of 115GeV/c 2 and an integrated luminosity of 5.7 fb _1 as a function of charged 
lcpton and ^-tagging information. The last column represents the percentage increase in 
signal prediction due to our original contribution ISOTRK charged leptons over TIGHT 
(CEM-f CMUP 4- CMX) charged leptons alone. In each column, the first values represent the 
WH signal and the numbers in brackets represent the ZH signal. 
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Figure 7.1: Scatter plot for charged lepton in our analysis after full event selection in a 
Pretag sample for Monte Carlo simulated events for the WH signal, where the Higgs boson 
has a mass of 115 GcV/c 2 . The left plot shows gaps in the rj-(j) coverage for these CMUP 
and CMX muon candidates due to non instrumented regions in the muon detectors are 
filled by the ISOTRK charged lepton candidates. The right plot shows gaps in the rj-cj) 
distribution of the electron candidate CEM. Since the CEM electron candidates present a 
smooth distribution, ISOTRK candidates are mostly muon candidates. 



7.2.1 Trigger 



The systematic uncertainty on the trigger used is measured by selecting data events with 
an orthogonal trigger and then asking the fraction of events that fire our trigger of interest. 
This process is also done for each charged lepton category. The systematic uncertainties are 
measured to be < 1.0% for the TIGHT charged leptons. For the ISOTRK charged lepton 
the analysis employs the new trigger parametrization described in Appendix [A] and the novel 
method to combine triggers described in Appendix [B] with the method to compute system- 
atic uncertainty described in Section [A. 51 We measured a trigger systematic uncertainty for 
the ISOTRK category of 3%. 
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7.2.2 Lepton Identification 



The lepton identification systematic uncertainty is measured by comparing a data sample 
highly enriched in Z boson events with a Z boson sample of Monte Carlo simulated events 
using PYTHIAas event generator. Z boson decay to charged leptons are used to evaluate 
the lepton identification systematic uncertainty for each charged lepton category. Details 
are presented in Subsection 14.4.41 

7.2.3 Initial and Final State Radiation 

The systematic uncertainty due to the effect of the initial state radiation (ISR) and final state 
radiation (FSR) is calculated by changing in the Monte Carlo simulation the parameters 
related to ISR and FSR to their half and double values. The systematic uncertainty is 
quoted as half of the difference between the signal event prediction with the two changes 
0. 

7.2.4 Parton Distribution Functions 

Another systematic uncertainty source is the fact that the parton distribution functions 
(PDFs) are not perfectly known, neither for the protons, nor the antiprotons. We first 
compute three systematic uncertainties that we use to compute the final PDF systematic 
uncertainty. 

The PDFs for the simulations used in this analysis use CTEQ5L [7T] , which is parameterized 
using 20 eigenvectors. We weight the nominal Monte Carlo simulated events for each of the 
20 eigenvectors and for each we compute a signal event prediction. The first PDF systematic 
uncertainty is quoted as the quadrature sum between the differences between the nominal 
and weighted event prediction. 

We also compute the signal event prediction using MRST72 [85] as PDF generator. The 
absolute value of the difference between the CTEQ5L and MRST72 signal prediction is 
quoted as the second PDF systematic uncertainty. 

In addition we compute the signal event prediction using PDFs that are generated using 
different values of the coupling constant of the strong force, i.e. different QCD energy scales. 
We use MRST72 (A QCD = 228 McV) and MRST75 (A QCD = 300 McV). The absolute value 
of the difference between the CTEQ72 and MRST75 signal predictions is quoted as the third 
PDF systematic uncertainty. 

The final PDF systematic uncertainty is computed by adding in quadrature the maximum 
between the first and the second with the third PDF systematic uncertainty [3]. 
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7.2.5 Jet Energy Scale 



First we compute the nominal WH signal event prediction using a Higgs boson mass of 
115GcV/c 2 . Then, for the same Higgs boson mass, we scale the jet energy scale [66] up 
and down by one standard deviation. We compute the signal event prediction for these two 
cases. We take the largest deviation from the nominal signal predictions as the jet energy 
scale rate systematic uncertainty. 

In this analysis we also consider one jet energy scale shape systematic. We scale the jet ener- 
gies up and down by one standard deviation and then we propagate this to all reconstructed 
variables, including the final analysis discriminant, described in detail in Chapter [9] and 
denoted the BNN output. Both the central and the alternate plus and minus BNN shapes 
are used in the limit calculation, as described in Section fl . 31 The top plots in Figure 17721 
()7.3j) presents the BNN shapes for JES zero, JES plus, JES minus for each of the 6-tagging 
categories (SVTSVT, SVTJP05 and SVTnoJP05, as described in Subsection E7J7J and the 
TIGHT (ISOTRK) charged lepton category. The bottom plots in the same figures represent 
the ratio of the JES plus and JES minus to JES Zero. 




WH115 BNN Output (115 GeV/t?) 



- Nominal 

- JES *1c 

- JES -1a 





WH115 BNN Output (115 GeV/t?) 
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Figure 7.2: TIGHT BNN output shape for default and ±1 sigma JES for WH signal 
(m H = 115 GeV/c 2 ). From left to right SVTSVT, SVTJP05 and SVTnoJP05, respectively. 
The horizontal axis represents the value of the BNN output, which is the final analysis 
discriminant. 



7.2.6 fr-tagging Scale Factor 



The systematic uncertainty on the 6-tagging scale factor between data and Monte Carlo 
simulated events is computed in Section 14771 
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Figure 7.3: ISOTRK BNN output shape for default and ±1 sigma JES for WH signal 
(m H = 115 GcV/c 2 ). From left to right SVTSVT, SVTJP05 and SVTnoJP05, respectively. 
The horizontal axis represents the value of the BNN output, which is the final analysis 
discriminant. 

7.2.7 Integrated Luminosity 

The integrated luminosity of 5.7 fb _1 used in this analysis has been measured with the 
Cherenkov Luminosity Counter, which as been described in detail in 13.3.21 The measure- 
ment has a systematic uncertainty of 6% [42] . 

7.2.8 Systematic Uncertainty Values 

The computed values for the total systematic uncertainties are presented in Table 17.21 for 
tight central charged lepton categories (CEM, CMUP and CMX) and in Table FT3l for the 
loose central charged lepton category (ISOTRK). 
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6-tagging category 


Lepton ID 


Trigger 


ISR/FSR/PDF 


JES 


b-tagging 


Total 


SVTSVT 


2% 


<1% 


4.9% 


2.0% 


13.6% 


15.9% 


SVTJP05 


2% 


<1% 


4.9% 


2.8% 


8.1% 


10.1% 


SVTnoJP05 


2% 


<1% 


3.0% 


2.3% 


4.3% 


6.1% 



Table 7.2: Systematic uncertainty values on the signal event prediction for the tight central 
charged lepton categories (CEM, CMUP, CMX) for each b-tagging category. 



^-tagging category 


Lepton ID 


Trigger 


ISR/FSR/PDF 


JES 


b-tagging 


Total 


SVTSVT 


4.5% 


3% 


7.1% 


1.7% 


8.6% 


12.5% 


SVTJP05 


4.5% 


3% 


6.4% 


2.4% 


8.1% 


11.9% 


SVTnoJP05 


4.5% 


3% 


8.4% 


4.7% 


4.3% 


11.8% 



Table 7.3: Systematic uncertainty values on the signal event prediction for the loose central 
charged lepton category (ISOTRK) for each b-tagging category. 



7.3 Summary 

The first part of this chapter has presented the calculation of event yield predictions for the 
WH and ZH signal processes, which is the multiplication of the integrated luminosity with 
the signal cross section, decay branching ratios and efficiencies for the various selection cuts. 
For the novel charged lepton category we introduced, we explained how real charged leptons 
that go towards non-instrumented regions of the detector are recovered and thus increase 
the signal event yield. 

In the second part of the chapter we presented the signal systematic uncertainties that we 
take into account in this analysis, namely the trigger, lepton identification, initial and final 
state radiation, parton distribution functions, jet energy scale, 6-tagging scale factor and 
integrated luminosity. 
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Chapter 8 

Background Estimation 



Our background estimation method assumes that we have correctly identified all the pro- 
cesses that mimic our WH signal: top quark pair, single top (both s- and t-channel), diboson 
{WW, WZ and ZZ), Z+jets (both Z + Heavy Flavour (HF) jets and Z + Light Flavour 
(LF) jets), W+jets (both W+LF and W+HF) and non-W (QCD) production. In Chap- 
ter [5] we described in detail our signal and background processes, especially their Feynman 
diagrams, signatures and Monte Carlo generators used to simulate the events. In this chap- 
ter we will describe how data and Monte Carlo generated events are used to compute the 
contribution of each background type in our data sample. 

We recall that we call the "pretag" sample the events that pass all the event selection re- 
quirements, but are not required to pass any ^-tagging information. The 6-tagging categories 
(SVTSVT, SVTJP05, SVTnoJP05) arc orthogonal to each other and each is a sub-sample 
of the pretag sample. The Pretag sample is dominated by background events, with very 
little signal contribution. This is why we use it as a "control region" or "sideband" to check 
that the background modelling is correct and in agreement with the data events. Table l8Tl 
gives the observed number of events for the Pretag data sample. Also, we use the pretag 
sample to measure the fraction of the data events that are non-W (QCD) and W+jets (both 
LF and HF) events, which is then extrapolated to the "tagged" categories signal region, as 
we will discuss in the next sections. 



Category 


Pretag Data Observation 


TIGHT 
ISOTRK 


83788 
21486 



Table 8.1: Observed number of events for the Pretag data sample for the TIGHT and 
ISOTRK charged lepton categories. 
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8.1 Top Quark and Other Electroweak Backgrounds 



We first compute the expected background events in our data sample background processes 
that are well modelled at tree level by Monte Carlo simulations (top quark pair production, 
single top, diboson and Z+jets). We use the same procedure and formulae as those used for 
the signal acceptance calculation, described in detail in Chapter [7] and summarized by the 
following formula: 

N pp _ x = e event ■ e tag • a pp ^ x ■ jcdt. (8.1) 

We denote the estimated number of events due to electroweak processes (diboson or Z+jets) 
as Newk and the estimated number of events due to top quark processes (top quark pair 
and single top) as Ntop- They will be used in the background estimation of QCD and 
W+jets processes, as described below. 

We use the cross sections and branching ratios specific for each background process, as seen 
in Table El 



Process 


Theoretical Cross Sections 


WW 
WZ 

zz 

Single Top s-channel 
Single Top t-channel 
Z + jets 
tt 


11.34 ± 0.70 pb 
3.22 ± 0.30 pb 
1.20 ± 0.20 pb 
1.05 ± 0.07 pb 
2.10 ± 0.19 pb 
787.4 ± 85.0 pb 
7.04 ± 0.44 pb 



Table 8.2: NLO theoretical cross sections and uncertainties used in the computation of 
predicted background events for processes that are well modelled at tree level (top quark 
pair production, single top s-channel, single top t-channel, diboson (WW, WZ, ZZ) and 
Z+jets). The simulated events use a top quark mass (m t = 172.5 GeV/c 2 ). 



The efficiency term e event multiplies all the individual efficiencies, except the 6-tagging scale 
factor. The efficiency term e tag is the 6-tagging scale factor for the event. For the pretag 
sample, e tag = 1. 

We measure the systematic uncertainty on the background normalization values due to the 
6-tagging efficiency by varying the scale factor and mistag probabilities within one standard 
deviation of their values and then reproducing this entire procedure. 
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8.2 Pretag: W+Jets and QCD Events Faking a W bo- 
son 



As described in detail in Subsection 1 6 .2.31 we developed a QCD template (modelling) for 
each of the charged lepton categories^ by selecting events that fail one or two relevant cuts 
for charged lepton selection. These samples are enriched in fake W boson candidates and 
therefore represent a model for the non-W (QCD) processes. However, the QCD background 
is the most poorly predicted and the least understood, which requires us to assign a large 
systematic uncertainty on its normalization estimate. 

We fit the fx distribution in pretag data to a sum of fx background shapes. In the fit the 
electroweak (Newq) and top (Ntop) normalization values are fixed, but the normalization 
of QCD (Nqcd) and VK+jets (Nw + j ets ) are allowed to float. The fit is performed in the 
range OGcV < Et < 120 GeV using the fx distributions of the QCD and W+jets samples, 
which arc obtained after removing the ]ftx > 20 GeV from the standard event selection. The 
normalization of the QCD and W+jets samples are then fixed by the fit. 

We define and extract from the fit the fraction of QCD events (Fq^ 9 ) from the total 
number of events in the pretag data sample after the standard selection ]fix > 20 GeV cut is 
applied - equation 18. 2|) . which we call N pretag : 

Af^n S (Et > 20 GeV) 

ppretag _ QCD Vr 1 / „n 

QCD N pretag^ T > 2 GeV) ' K ' 

Then the number of QCD events in the pretag sample {Nq'^ 9 ) is given by: 

Nqcd 9 = NPreta9 ■ f qcd ■ ( 8 - 3 ) 



We are careful to check how the QCD normalization changes when we modify the histogram 
binning, thc$r fit interval, thc$r cut for definition of Fqcd, as well as the non-W models 
used for CEM, CMUP, CMX, ISOTRK and PHX. As a conclusion of these studies, we assign 
a conservative 40% systematic uncertainty on the QCD normalization. Despite this very 
large uncertainty, the total number of QCD events is relatively small in the final sample, 
thanks to the fx cut we apply in the final analysis. This permits the analysis to remain 
sensitive to the WH process. 

At this stage we have estimated the number of electroweak, top quark and QCD processes in 
the pretag data set. The remaining events in the sample are therefore M^+jets, as resulting 
from the following formula: 

1 Each charged lepton category is susceptible to different kinds of faking due to the different reconstruction 
criteria. 
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AT pretag _ AT pretag /i _ ppretag\ _ AT pretag _ AJ pretag /o a \ 

^W+jets — JV V 1 r QCD > EWK ly TOP ■ \°-*) 

The Vy+jets sample is divided in two: IV+heavy flavour (W+HF) and VF+light flavour 
(W+LF). 

The W+LF background contains light flavour jets that are incorrectly tagged to originate 
from a heavy flavour quark (b or c) by the ^-tagging algorithm. This phenomenon is called 
"mistag" , as discussed in Section 14.71 

8.3 Tag: VF+Jets and QCD Events Faking a W boson 

Now we reproduce the procedure described above for each of the 6-tagged categories. The 
background templates are all weighed by e* a9 , as described in the section above. 

8.3.1 QCD Background 

For a given category the QCD normalization is given by Equation 18.51 (to be compared 
with Equation 18.31 for prctag) and the VF+jets normalization is given by cquation l8.6l (to be 
compared with Equation 18.41 for prctag). 



N^ D = N^.F^S a . (8.5) 
N^ +jets = ■ (1 - F^d 9 ) ~ N^ K - N%° P . (8.6) 

8.3.2 W+HF 

The normalization of W+HF in a tagged sample {N^ 9 +HF ) is computed starting from the 
normalization of l^+jets in the pretag sample, that is multiplied by the fraction of pretag 
events with jets matched to heavy flavour quarks (Fhf), by the scale factor between data and 
Monte Carlo for the heavy flavour fraction (K = 1.4 + 0.4) and by the 6-tagging efficiency: 

KUf = N%%L ■ (Fhf ■ K) ■ e *w . (8.7) 

The heavy flavour fraction Fhf is measured from Monte Carlo simulated events of all the 
processes that produce one and only one real W boson. This quantity does not agree with 
the data prediction exactly and their ratio is represented by the scale factor K , which acts 
as a correction for the heavy flavour fraction Fhf that is applied to Monte Carlo simulated 
events from our analysis. The K factor is measured in the 1-jet bin of the analysis, that is 
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a "sideband" and not a signal region and has the largest statistics of all the jet bins. We 
then assume that K has the same value across all the jet bins. 

8.3.3 W+LF 

The normalization of W+LF in a tagged sample {N^ 9 +HF ) is computed starting from the 
normalization of I^+jets in the pretag sample, that is multiplied by the fraction of pretag 
events that is not matched to heavy flavour (1 — (Fjjf ' K)) and by the overall fake tag rate 
(e mjsta9 ), also called the mistag rate: 

K 9 + LF = N^- e 9 ts ■ (1 - Fjjp ■ K) ■ £ * . (8.8) 

The mistag rate is measured for each 6-tagging algorithm in generic light jet data samples. 
The reason that sometimes light flavour jets are tagged as heavy flavour jets is due to 
finite tracking resolution. We measure the mistag rate using the negatively tagged jets, as 
explained in Scction [4.7l In the end we produce a function that inputs various jet quantities 
and outputs the mistag probability for that jet. We call this function a mistag matrix. The 
mistag matrix for SecVtx is a function of jet rj, jet Et, number of interaction vertices in the 
event, jet track multiplicity and the scalar sum of all the transverse energy in the event. The 
mistag matrix for JetProb uses the same information and, in addition, uses the z coordinate 
of the primary interaction vertex. 

Once we obtain the mistag rate on a jet-per-jet basis, we need to do the same thing for the 
entire event. We add the mistag jet rates to obtain an event mistag rate. We sum all the 
event mistag probabilities to obtain the total mistag probability, e mlsta 9 . 

We measure the systematic uncertainty on the normalization of the W+LF background by 
fluctuating the per-jet tag rates by one standard deviation and then reproducing the entire 
procedure. 

8.4 Background Fits and Event Counts 

Figures 18.11 through 18.41 show the results fitting the JBt distribution in the Pretag and 
SVTSVT, SVTJP05 and SVTnoJP05 tag regions for each charged lepton category. We 
note that we fit separately for CEM, CMUP, CMX charged lepton categories and only later 
add the templates and event counts. We set a default 40% systematic uncertainty on the 
QCD (non-W) background normalization. Due to low statistics, for ISOTRK SVTSVT and 
SVTJP05 we use a 100% systematic uncertainty. 

Table lSUl shows the event counts for each of the six analysis channels given by the two charged 
lepton categories (TIGHT and ISOTRK) and the three 6-tagging categories (SVTSVT, 



115 



SVTJP05 and SVTnoJP05). The event counts are estimated using Monte-Carlo simulated 
samples for the signal and background processes and the data sample for the real observed 




Figure 8.1: QCD (Non-W) fraction estimate for the Pretag sample. The horizontal axis 
represents the fully corrected MET. The QCD background is represented in pink, the re- 
mainder of backgrounds in green. The dashed line represents the sum of all the backgrounds 
and the points represent the data. The figures represent (left to right and top to bottom) 
the CEM, CMUP, CMX and ISOTRK charged lepton categories. 



8.5 Summary 



This chapter has presented the methodology to calculate the event yield prediction for the 
various background processes. Since any blind analysis compares a control sample with an 
analysis sample, we define the control sample as the Pretag sample and the signal samples 
as each of the three 6-tagging categories. Although the signal samples are included in the 
Pretag samples, since the Pretag sample is much larger this is a very good approximation 
to the orthogonality of the control sample and the pretag sample. We then presented 
the computation of the top quark and electroweak background event yield by using the 
same methodology used for the signal processes. It is only the W+jets and the QCD 
background yields that are determined from a missing transverse energy fit to the data 
distribution. A slightly more complex procedure is used for the 6-tagging category than for 
the Pretag sample. We presented the fit plots in order to demonstrate the quality of the 
fits. We concluded with the presentation of the table with the event yield for the signal and 
background processes, as well as the measured event counts, for each charged lepton and b- 
tagging category, as well as the uncertainty on these values due to the systematic uncertainty. 
In all categories, the background prediction and data agree within the systematic uncertainty 
for the background. 
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ing Transverse Energy [GeV] 
Lepton CjMX; jet bin 2: fQCD=13.7% 



RMS 25.04 




Missing Transverse Energy [GeV] 
Lepton ISOTRK: jet bin 2: fQCD=20.6% 




g Transverse Energy [GeV] 



Missing Transverse Energy [GeV J 



Figure 8.2: QCD (Non-W) fraction estimate for the SVTSVT sample. The horizontal axis 
represents the fully corrected MET. The QCD background is represented in pink, the re- 
mainder of backgrounds in green. The dashed line represents the sum of all the backgrounds 
and the points represent the data. The figures represent (left to right and top to bottom) 
the CEM, CMUP, CMX and ISOTRK charged lepton categories. 




Missing Transverse Energy [GeV] 



Missing Transverse Energy [GeV] 



Figure 8.3: QCD (Non-W) fraction estimate for the SVTJP05 sample. The horizontal axis 
represents the fully corrected MET. The QCD background is represented in pink, the re- 
mainder of backgrounds in green. The dashed line represents the sum of all the backgrounds 
and the points represent the data. The figures represent (left to right and top to bottom) 
the CEM, CMUP, CMX and ISOTRK charged lepton categories. 
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CDF Run II Preliminary 5.7 fb" 1 






Predicted number of events, Uncertainty and Percent of Uncertainty 




Sample Type 


SVTSVT 


Err 


SVTJP05 


Err 


SVTnoJP05 


Err 


TIGHT Charged Lcpton 




w 4Q + a Aft 
01 .4:0 zn o.4u 




Act o _|_ e: nc 






n 1 


STopS 


22.04 ± 2.62 


0.12 


16.32 ± 1.42 


0.087 


56.33 ± 4.78 


0.085 


STopT 


5.96 ± 0.8 


0.13 


5.98 ± 0.77 


0.13 


92.16 ± 9.42 


0.1 


WW 


0.8 ± 0.2 


0.25 


2.97 ± 0.96 


0.32 


84.57 ± 11.8 


0.14 


wz 


5.17 ± 0.77 


0.15 


4.15 ± 0.49 


0.12 


23.38 ± 2.64 


0.11 


zz 


0.19 ± 0.03 


0.16 


0.18 ± 0.03 


0.17 


0.83 ± 0.08 


0.096 


Zjets 


3.19 ± 0.44 


0.14 


4.08 ± 0.56 


0.14 


62.48 ± 8.47 


0.14 


Wbb 


117.98 ± 48.1 


0.41 


104.35 ± 42.1 


0.4 


694.19 ± 279 


0.4 


Wcc 


13.54 ± 5.61 


0.41 


44.14 ± 18.3 


0.42 


793.33 ± 321 


0.41 


Wlf 


6.54 ± 1.66 


0.25 


25.81 ± 8.86 


0.34 


913.58 ± 120 


0.13 


QCD 


20.37 ± 8.14 


0.4 


18.92 ± 7.57 


0.4 


271.08 ± 108 


0.4 


Bkg 


253.21 ± 76.8 


0.3 


273.1 ± 86.1 


0.32 


3204.53 ± 888 


0.28 


WH115 


3.02 ± 0.39 


0.13 


2.2 ± 0.15 


0.068 


7.52 ± 0.53 


0.07 


ZH115 


0.13 ± 0.01 


0.077 


0.09 ± 





0.33 ± 0.03 


0.091 


Obs 


213 





234 





2952 





ISOTRK Charged Lcpton 


DiTop 


22.54 ± 3.49 


0.15 


18.48 ± 2.23 


0.12 


85.07 ± 9.48 


0.11 


STopS 


7.98 ± 1.02 


0.13 


5.95 ± 0.59 


0.099 


20.39 ± 1.98 


0.097 


STopT 


2.2 ± 0.31 


0.14 


2.17 ± 0.3 


0.14 


31.47 ± 3.54 


0.11 


WW 


0.24 ± 0.06 


0.25 


0.85 ± 0.3 


0.35 


23.17 ± 3.4 


0.15 


WZ 


1.49 ± 0.24 


0.16 


1.23 ± 0.15 


0.12 


6.97 ± 0.86 


0.12 


ZZ 


0.11 ± 0.02 


0.18 


0.07 ± 0.01 


0.14 


0.39 ± 0.04 


0.1 


Zjets 


2.05 ± 0.3 


0.15 


2.64 ± 0.39 


0.15 


37.01 ± 5.22 


0.14 


Wbb 


29.97 ± 13.1 


0.44 


25.89 ± 11.2 


0.43 


164.47 ± 71.2 


0.43 


Wcc 


3.36 ± 1.48 


0.44 


10.21 ± 4.54 


0.44 


162.51 ± 70.6 


0.43 


Wlf 


2.09 ± 0.64 


0.31 


7.91 ± 3.15 


0.4 


246.06 ± 56.7 


0.23 


QCD 


15.42 ± 6.17 


0.4 


20.22 ± 8.09 


0.4 


230.14 ± 92.1 


0.4 


Bkg 


87.45 ± 26.8 


0.31 


95.62 ± 31 


0.32 


1007.65 ± 315 


0.31 


WH115 


1.01 ± 0.14 


0.14 


0.74 ± 0.07 


0.095 


2.46 ± 0.21 


0.085 


ZH115 


0.09 ± 0.01 


0.11 


0.06 ± 0.01 


0.17 


0.22 ± 0.02 


0.091 


Obs 


75 





75 





929 






Table 8.3: Summary of background and signal predicted event number and data observed 
event number for each of the six analysis channels. The table uses the following notations 
for backgrounds and signals: DiTop - top quark pair; STopS - single top s channel; STopT 
- single top t channel; WW - WW; WZ - WZ; ZZ - ZZ; Zjets - Z plus jets; Wbb - W 
plus bb; Wcc - W plus cc or W plus cj; Wlf - W plus light flavour jets incorrectly tagged as 
heavy flavour jets (mistags); QCD - non-IU (QCD); WH115 (ZH115) - WH (ZH) assuming 
a 115 GeV/c 2 mass for the Higgs boson. The systematic uncertainties are added linearly in a 
conservative but realistic approach, since most of the systematic uncertainties are correlated 
between all background processes. 
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Figure 8.4: QCD (Non-W) fraction estimate for the SVTnoJP05 sample. The horizontal 
axis represents the fully corrected MET. The QCD background is represented in pink, the 
remainder of backgrounds in green. The dashed line represents the sum of all the back- 
grounds and the points represent the data. The figures represent (left to right and top to 
bottom) the CEM, CMUP, CMX and ISOTRK charged lepton categories. 



Since we do not see a signal excess, we employ multivariate techniques to separate the signal 
and background even more. We detail these in the following chapter. 
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Chapter 9 

Neural Network Discriminant 



The event selection is optimized to separate as much as possible the signal and background 
processes. Requiring 6-tagging separates them even more. For a given 6-tagging category, a 
WH search based on a counting experiment would be a search for a bump in the distribution 
of the invariant mass of the two jets (dijet invariant mass). However, since the expected 
signal is about two orders of magnitude smaller than the background prediction, as seen in 
Chapter [HJ such a bump would not be visible. The main reason why a counting experiment 
WH analysis is not sensitive enough is that the dijet invariant mass describes just the Higgs 
candidate system and ignores any information about the reconstructed W boson candidate 
(charged lepton and missing transverse energy), as well as any correlation between the W 
and Higgs systems. 

For improved sensitivity, in this analysis we use a multivariate technique. In general, a 
multivariate technique uses several kinematic distributions to discriminate further the signal 
events from the background events. The multivariate technique chosen for this search is an 
artificial neural network. 



9.1 Artificial Neural Networks Overview 



Artificial Neural Networks (ANN) are multivariate technique functions that are produced 
through an iterative training process. An ANN is formed of several interconnected nodes. 
Each connection is weighted by a sigmoid function. The first layer contains input nodes, the 
last layer contains the output nodes, and the rest of the nodes are organized into intermediate 
hidden layers. Using a feed-forward process, an ANN allows the information to flow from 
the input nodes to the output nodes. For a given event, the input nodes receive values of 
certain kinematic quantities and the output nodes give the neural network output values for 
that event. 
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9.2 Neural Network Structure 



In this analysis we use as a final discriminant the output of an ANN with only one hidden 
intermediate layer and with only one node in the output layer, as seen in Figure 19.11 The 
ANN is basically a function that takes various quantities from the event as an input and 
returns only one number. 




Input node Hidden node Output node 



Figure 9.1: The schematic view of an artificial neural network used in this analysis to 
produce the final discriminant between signal and background [3]. 



If there are Ni nputs nodes in the input layer, every node j in the hidden layer is described by a 
sigmoid function that depends on the neural network input values x(xi , x% , ■ ■ • , Xi, ■ ■ ■ , XN input3 )'- 

h 3 (x) = J— . (9.1) 

1 + exp(-2_^ UijXij 



The weights Uij are determined by training the artificial neural network. Once trained, the 
ANN output from the only node in the third layer is computed using a linear combination 
between the hidden layer values: 

f(x)=^,v j h j (x), (9.2) 

3 

where the weights Vj are also determined by training the neural network. 

9.3 Neural Network Training 



Before the training starts, all the weights (/i^ and vj) have some initial values. At the end 
of the training process, these weights would be such that the ANN output is as close as 
possible to the chosen target value (t), which means we optimize the quantity E: 

E=\{f{x)-tf. (9.3) 
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In this analysis, we choose for the ANN a target value of 1 for signal samples and for 
background samples. Therefore we use signal and background samples for the training 
process. We divide these samples in half. One half is used for training and the other half is 
used for validation of the training on independent dataset samples. 

For each event from the training sample, we minimize the quantity E. We then use back 
propagation to change the values of the weights {hij and Vj) with an amount proportional to 
the ANN output to the target. The training is finished when new events would not change 
any more these weights in a significant way. 

This is not the first search that uses an ANN at CDF. It is now a standard multivariate 
technique that is used in high energy physics experiments. 

There are many types of ANN. In this search we use a Baycsian Neural Network (BNN) 
algorithm |86] [87] . The advantage of BNN over other artificial neural network algorithms is 
that it is less prone to over training because of the Bayesian statistical interpretation where 
each weight is considered as a posterior probability in Bayes' theorem. 

For this analysis, we employ distinct BNN discriminant functions which were optimized for 
one of the three 6-tagging categories: SVTSVT, SVTJP05 and SVTnoJP05. 

9.4 Neural Network Training Check 

Once training is done, we perform a check for overtraining. Overtraining may happen when 
the neural network "learns" the training sample instead of "generalizing" a model from the 
training sample. In overtraining the smallest statistical fluctuation is believed to be part 
of the real model. For this reason, an overtrained neural network has very little predictive 
power. If overtrained, the neural network output distribution for an independent test sample 
will not be the same as the one for the training sample. 

We check that our neural networks are not overtrained by comparing the training sample 
shape to that for a test sample which was not used in training. Figure 19.21 shows examples 
of an overtraining check for a Higgs mass of 115 GcV/c 2 . We conclude that our BNN dis- 
criminant is not overtrained, since the response of the training sample is in good agreement 
with that of the test sample. 

9.5 Neural Network Inputs 

Each BNN is optimized independently to separate the WH signal from the various back- 
ground processes (W+HF, W+LF, top quark pair, single top s-channel production). We 
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Figure 9.2: Bayesian Neural Network output for WH signal (red) and background (black) 
simulated events with a Higgs boson mass of 115 GeV/c 2 for the training sample (solid lines) 
and orthogonal test sample (circles) [3]. We can see that indeed the signal peaks at output 
values of 1 and background at output values of 0. We also see that the two samples give 
the same shape and normalization, which confirms that the BNN training is to be trusted. 
The top left plot refers to the SVTSVT 6-tagging category, the top right plot refers to 
the SVTJP05 6- tagging category and the bottom plot refers to the SVTnoJP05 6-tagging 
category. 
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use distinct BNN input quantities for each tagging category. 

For the SVTSVT 6-tagging category, seven kinematic quantities are used as inputs. The first 
one is the invariant mass of the two jets in the event (Mjj). This quantity is computed after 
both jets have their energies corrected using another type of artificial neural network, as 
described in Section ^. 81 The second quantity is the scalar sum of the transverse momenta of 
the charged lepton and the jets, from which the missing transverse energy is subtracted (j?t 
imbalance). The third one is the invariant mass of the charged lepton, missing transverse 
energy and one of the two jets, where we choose the jet that produces the largest invariant 
mass (M™" z ). The fourth one is the charge of the charged lepton multiplied by its r\ 
coordinate (Qi ep ■ T)t ep ). The fifth one is the scalar sum of the transverse energy of the 
loose jets of the events (J] Et (loose jets)). The loose jets are orthogonal to the tight jets 
and have Et > 12 GeV and \rj\ < 2.4. The tight jets are explicitly not included in this 
summation. The sixth one is the transverse momentum of the reconstructed W boson 
candidate, computed as the vector sum of the transverse momentum of the charged lepton 
and missing transverse energy (Pt(W)). The seventh one is the scalar sum of the transverse 
energies of all the objects in the event, such as jets, charged lepton and missing transverse 
energy (H T ). 

For the SVTJP05 6-tagging category, we also use seven kinematic quantities as inputs. Five 
of them are the same as for the SVTSVT category, namely Mjj, Qi ep -r]i ep , Et (loose jets), 
Pt(W) and Ht- Instead of MJ^j x we use M/™", which means we pick the jet that minimizes 
the quantity. We also use missing transverse energy (fir) instead of pt imbalance. Both 
changes are motivated by the fact that for each 6-tagging category we have tested a large 
number of input parameter combinations and we have chosen the inputs that give the largest 
sensitivity in that category. 

For the SVTnoJP05 6-tagging category, we also use seven kinematic quantities as inputs, 
namely Mjj, Qi ep ■ r]i ep , -E?T(loose jets), pr(W), Ht, and pr imbalance. 

9.6 Background Modelling Check 

We plot the distribution of each of the inputs and outputs for BNN to check that the total 
background prediction agrees with the observed data distribution, as seen in Figures 19.31 
19.41 and in those of Appendix [Dj This procedure cross checks that our various background 
processes are modelled well. As all distributions agree in normalization and shape with the 
data distributions, we are safe to use the BNN output in the final limit calculation. 

9.7 Neural Network Output 

From a physics point of view, the BNN takes as an input an entire event (from which it selects 
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the information required by the input nodes) and gives as an output only one value, which 
represents the probability the event is signal or background. The BNN is trained for signal 
to peak at output values of 1 and background at output values of 0. The training sample 
is obtained using the same event selection described in Chapter [51 The training is done 
independently for Higgs boson masses between 100 and 150 GeV/c 2 with increments of 5 
GeV/c 2 . Indeed, this is confirmed in the normalized-to-unit-area BNN output distributions 
for signal and background for the various 6-tagging categories and the TIGHT and ISOTRK 
charged lepton categories, where the Higgs boson is assumed to have a mass of 115 GeV/c 2 , 
which can be seen in Figure 19.51 

9.8 Neural Network for Jet Energy Correction 

All quantities in the analysis arc computed using the jet energies corrected for various 
effects, as described in Chapter @] However, as the dijet invariant mass is the most sensitive 
observable to distinguish between the WH signal and various types of background, we make 
an extra effort to reduce its uncertainty. This improves directly the WH search sensitivity. 

We have designed a method that corrects the jet energies once more, depending on the 
information if they have been 6-tagged or not [88]. Each correction is implemented on a 
per-jet basis. Typically the jet energies measured by the calorimeter are underestimated 
and even more so for jets originating in b quarks due to the semi-leptonic decays of the b 
quark producing a muon that is a minimum ionizing particle in the calorimeter. We add 
vertex and tracking information about the jet in order to improve the jet energy resolution. 

To achieve this goal, we use a multivariate regression technique in the form of a second 
artificial neural network algorithm. All neural network algorithms provided by ROOT have 
been tested and the best results were obtained with the Broyden-Fletcher-Goldfarb-Shanno 
(BFGS) method [55]. 

We train three BFGS neural networks, one for jets tagged by SecVtx, one for jets not tagged 
by SecVtx but tagged by JetProb, and one for jets that are neither tagged by SecVtx nor 
by JetProb. We use only WH signal samples, that are divided in two. One half is used 
as training sample and another one as test sample. We do not use background samples 
because we are focused on improving the dijet mass resolution for signal. This quantity 
would anyway have a broad distribution for background processes. The input nodes use 
quantities that describe the jet and will be enumerated below. There is only one output 
value and its target value has a specific value for each jet taken as an input by the neural 
network. In principle this method works for any type of jet, but it must be trained for the 
specific type of jet. 
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9.8.1 BFGS Inputs 



For the neural network trained on SecVtx-tagged jets, we use the following nine quantities 
as inputs: jet Et; jet pt; uncorrected jet Et; jet transverse mass; jet decay length (L xy ); 
uncertainty on the jet decay length (cr(L xy )); sum of transverse momenta of tracks originating 
from the secondary vertex identified by the SecVtxalgorithm; the maximum px of tracks 
inside the jet; the scalar sum of transverse momenta of tracks inside the jet. For the jets 
not tagged by SecVtx, we use the same input variables except L xy and cr(L xy ). 

9.8.2 BFGS Output 



The neural network is designed to have as an output (N N output ) the extra correction factor 
by which the standard-corrected jet transverse energies (Et) have to be multiplied in order 
to achieve the newly corrected values (ET corr )- 

ETcorr = N iV out p U t ■ Et ■ (9.4) 



For each jet, the target value (t) for NN outpu t is represented by the ratio between the 
Monte Carlo generator level transverse energy (ET gen ) and the standard-corrected transverse 
energy (E T ). 

t= ^n (95) 

Et 

Following this procedure, after the training, ET corr will be closer to Et gen than Et- 
9.8.3 Background Modelling Check 



Also for this second neutral network type used for 6-jet energy corrections, we check that 
the data reproduce well the background modelling for all the kinematic distributions of the 
inputs and output of these two jet energy correction neural networks. For SVTSVT events, 
we sum up the histograms for each of the two tight jets in the events so that we double 
the statistical power. Figure 19.61 shows the input and the output of the neural network 
based SVT-correction, on a jet by jet basis, proving that this correction works both for 
Mote-Carlo-simulated and data events. 

9.8.4 Dijet Invariant Mass Resolution 

We now compute the dijet invariant mass using ET corr instead of Et for all the samples 
used in our analysis: Pretag, SVTSVT, SVTJP05 and SVTnoJP05. This specific correction 
improves the resolution of the dijet invariant mass from about 15% to about 11% in the 
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SVTSVT category and from about 17% to about 14% in the SVTnoJP05 category across 
all the Higgs boson mass range, as we can see Figure I9T71 



9.9 Summary 



In conclusion, in this chapter we have presented two artificial neural networks that we use 
in our analysis. One corrects the energy of the jets based on the information of whether the 
jet is tagged or not by the 6-taggers used in our analysis. The corrected energies are used to 
compute the dijet invariant mass variable, which in turn is the main input variable to the 
final analysis discriminant, another artificial neural network (BNN) trained to separate the 
signal and the various background processes. In the next chapter we will present how the 
BNN output is used in order to compute an upper limit on the Higgs boson cross section 
times branching ratio. 
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Figure 9.3: First half of the control plots for TIGHT charged lepton Pretag BNN Input 
Variables. 
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Figure 9.4: Second half of the control plots for TIGHT charged lepton Pretag BNN Input 
and the BNN Output Variable for the Higgs mass of 115 GeV/c 2 . 



129 






Figure 9.5: Normalizcd-to-unit-area BNN output distributions for signal and various back- 
ground types for the various 6-tagging categories and a Higgs boson mass of 115GeV/c 2 . 
From top to bottom, the plots are specific to the SVTSVT, SVTJP05 and SVTnoJP05 b- 
tagging categories, respectively. The left (right) plots correspond to the TIGHT (ISOTRK) 
charged lepton categories, respectively. 



130 



CDF Run II Preliminary 



CDF Run II Preliminary 





Figure 9.6: NN &-jet energy correction input variables for SecVtx tight-tagged jets and the 
output of 6-jet neural network correction. The b-NN jet energy correction was implemented 
by Timo Aaltonen and these plots were produced by me. They are shown in a paper 
submitted to Nuclear Instruments and Methods, for which I am one of the four authors [88] . 
The figures represent a sample of TIGHT charged leptons in the SVTSVT tag category 
events where both tight jets in the events are shown on the same histogram in order to 
double the statistics. 
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Figure 9.7: Reconstructed Higgs boson invariant mass resolution in a WH sample using 
the standard jet energy correction and the neural network jet energy correction. The top 
(bottom) plot presents the SVTSVT (SVTnoJP05) ^-tagging category [3]. Although not 
exemplified in this plot, we also apply the jet energy correction to the dijet invariant mass 
in the Pretag and SVTJP05 samples. We see an improvement of dijet mass resolution 
across the entire Higgs boson mass range when the neural network corrections are used. As 
this quantity is the most sensitive input to the final discriminant neural network, our WH 
exclusion limits improve due to this correction. Credit image to Timo Aaltonen. 
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Chapter 10 



Upper Limits on Higgs Boson 
Production 

In this thesis we present a search for the existence of the Standard Model Higgs boson. In 
the absence of an observation, as a final result we present an upper limit on the Higgs boson 
cross section times branching ratio at 95% credibility level (CL), as is typical in Bayesian 
statistical inference in experimental particle physics. We compute the upper limit using 
the BNN neural network output distributions for signal, background and data processes. 
Since the BNN distributions are binned, we use a binned likelihood technique. In Section 
110.11 we introduce an unbinned Bayesian likelihood technique, i.e. the treatment of one 
specific bin in the distribution, which is equivalent to a simple counting experiment. In 
Section 110.21 we explain how each independent systematic uncertainty is taken into account 
as a nuisance parameter that is convolved with the likelihood. In Section 110.31 we present 
how the method is generalized to a given number of bins for one analysis channel, i.e. a 
charged lepton and 6-tagging category pair. In Section 110.41 we explain how combining all 
analysis channels is equivalent to adding new bins to only one distribution. In Section 110.51 
we explain how we use pseudo-experiments to compute the expected limit (the sensitivity of 
the analysis) and read data to compute the observed limit (our result). In Section [10.61 we 
present the expected and observed upper limits in the WH search in the TIGHT, ISOTRK, 
and TIGHT+ISOTRK channels, with all 6-tagging categories combined. 

10.1 Bayesian Upper Limit Calculation 

One a bin-by-bin basis, we denote with /x the expected number of events and with n the 
measured number of events. Our goal is to evaluate the probability that we expect [i events 
when we measure n events, given that the probability that we measure n events given we 
expect [i events is described by the Poisson distribution expressed by Formula 110.11 This 
process is called statistical inference. 
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(10.1) 



There are two principal methodologies for statistical inference. The frequentist approach 
assumes no prior knowledge about the generic probabilities P(fi) and P(n) and introduces 
a test statistic variable that is used to test if the data distribution agrees more with the 
background-plus-signal hypothesis, or with the background-only hypothesis. The frequentist 
approach is used to set upper limits on the Higgs boson at the DZero experiment at Fcrmilab. 
At CDF we use the Bayesian approach, where prior information is taken into account as 
well. We start with Bayes' theorem: 

= (102) 

P(n) 

where P(fi\n) is the posterior probability on the expected number of events fx, i.e. the 
probability distribution of fx after the experiment is performed; L(n\fx) is the probability to 
measure n events given the expected number of events fi, has a Poisson distribution and 
is given by formula 110.31 7r(/x) is the prior probability on the expected number of events 
fx, i.e. the probability distribution of fx before the experiment is even performed; P(n) is 
the probability of distribution for the observed number of events n, which is evaluated as 
a normalization constant given by the condition that the sum of all probabilities should be 
exactly 1, as shown in Equation 110.41 

L{n\fx) = L- . (10.3) 
P(fx\n) dfx = 1 . (10.4) 



By combining Equations 110.21 and 110.41 we obtain 

/■"— L(n\fx)n(fx) 
A min P(n) 

which leads to the formula for P(n) 



dfx = l , (10.5) 



P(n)= L(n\fi)n(fi) dfi . (10.6) 



A decision has to be made about what prior information should be assumed about the 
expected number of events distribution. We assume a fiat prior with ir{fx) = constant in 
order not to bias us towards a certain distribution of the signal and yet help us deal with 
systematic uncertainties. Given that P(n) is also a constant with respect to fx we define c„ 
as a constant that depends only on n, and given by the formula 
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(10.7) 



Combining Equations 110.21 and 110.71 we can express the posterior probability 

P(fj,\n) = Cn !-— . (10.8) 
n ! 

In our analysis, the number of expected events fi is represented by the sum of the expected 
number of background events (B) and the expected number of signal events (£), as seen in 
equation 

fx = B + S. (10.9) 



Also, the observed number of events n is the number of measured data events (-D), as in 

n = D. (10.10) 



By combining Equations 110.81 110.91 and 110.101 the equation for the likelihood becomes 

(B + S) D • e -( B+s ) 
P(B + S\D) = cd [ ' D] 6 ■ (10.11) 



In our analysis, we want to set an upper limit on the value of the number of Higgs sig- 
nal events S, given not only the measured number of data events N, but also the number 
of expected background events B. All the predictions of the Standard Model have been 
confirmed, except the existence of the Higgs boson. This is why we are sure, within uncer- 
tainties, of the existence of the background processes and of their predicted number of events 
B. We then reinterpret the posterior probability as the posterior probability of having S 
expected signal events given B expected background events and D measured data events 
and we note it as P(S\B, D). After the change of variable from B + S to S, Equation 1 10. Ill 
becomes 



(B + S) D ■ e^ B+s ) 
D\ 



P(S\D,B) = c D y I . (10.12) 



We denote with s the number of signal events predicted by the Standard Model. We express 
the true value for the signal predicted events S by introducing the ratio / as S = f ■ s. 
We also denote that the total background event prediction is formed of several background 
processes. We therefore note B — bk, where k is the index of a given background sample. 
The expected value is now 
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m = bk + / ■ s 

k 



(10.13) 



After a change of variable from S to /, Equation 110.121 becomes 

_ , „ fV hi. + f ■ s) D ■ e-CE>> b "+f- a ) 

P(f\D,J2b k ,s)=c' D ^ bk + f S) . 



D\ 



(10.14) 



By definition, the minimum value that / can take is zero. This case corresponds to the 
background-only hypothesis. Any positive value of / corresponds to the background-plus- 
signal hypothesis. In this analysis, we set a 95% credibility level upper limit on /, as 
is typical in particle physics experimental searches for new particles (the Standard Model 
Higgs boson search or new elementary particles predicted by theories beyond the Standard 
Model). In other words, we want to find the upper value I for the ratio / between the true 
number of expected events S and the Standard Model prediction s, such that the probability 
that / lies in the interval [0, 1] is 0.95. Therefore, we have to solve for I in the equation 



/ P(f\D,Y j b k ,s)df = 0.95. 
Jo 



(10.15) 



By combining Equations 110.141 and 110.151 we have to solve for I in the equation 

I c ^ k " k + r S) D] # = 0.95. (10.16) 

Similarly with equation 110.51 the constant term cd is given by the condition 

1 t, (T, bi. + f ■ s) D ■ e - C£k *>*+/•«) 

c D ^h±± S) di df = l (10.17) 



to be 



Jo 



10 D\ 



By combining Equations 110.151 and 110. lSl we obtain the equation for I: 

ri (Efcfefc+/-s) g -e~ (£fcbfc+/ - 3) Hf 

Jo Dl 1 -0.95. (10.19) 



r°° (E fc fc fc +/-3) D -e-Efc"fc+J'-°) , f 

Jo Dl U J 



Equation 110.191 cannot be solved analytically, so numerical integrators have to be used. 
The strategy is to consider increasingly higher values of I, compute the integral and stop 
immediately after the value of 0.95 is reached. 

In conclusion, this is the Bayesian approach to computing the upper limit / for an analysis 
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with only one bin in our final discriminant (for a simple counting experiment) and without 
taking any systematic uncertainties into account. In the following sections we will present 
how to take into account the systematic uncertainties, several bins and several analysis 
channels. 

10.2 Taking Systematic Uncertainties Into Account 



The procedure described in the previous section does not take into account the systematic 
uncertainties on the Standard Model predicted events for several background channels (bk) 
and for the Higgs boson process (s). 

There are two types of systematic uncertainties. Rate systematic uncertainties apply to all 
the bins in the distribution. Shape systematic uncertainties are rate systematic uncertainties 
applied on a bin-by-bin basis. In this section we will describe how rate systematic uncertain- 
ties are introduced in a one-bin analysis. Shape systematics for binned discriminants will 
be described in the following section. In our analysis we employ only symmetric systematic 
uncertainties characterized by the standard deviation a. 

For each independent systematic uncertainty we introduce a nuisance parameter Vj as a 
coefficient to the expected number of events for a particular process. Most systematic 
uncertainties affect multiple physics processes. If we consider A s j g and Abk g ,k the sets of 
nuisance parameters that apply to the signal and to the k th background process, respectively, 
then we can express the expected number of events fx as 



<"K' s 'E]=e(( n v i i I •**+/• I n v i i ( io - 2 °) 

We model each nuisance parameter i/j as a truncated Gaussian distribution. We recall that 
a Gaussian distribution of a variable x with a mean m and a variance a is given by 

G{x\m,a) = 1 ■ e -1 ^- . (10.21) 
V 2na 2 

Since nuisance parameters are used as coefficients for the predicted number of events for each 
background and signal process, we set a to the value of the chosen systematic uncertainty 
for that particular process expressed as a ratio of the absolute systematic uncertainty and 
the absolute expected number of events. For example, if a systematic uncertainty is 3%, 
we set a = 0.03. This implies that the expected value m is set to 1.0. Therefore, nuisance 
parameters Vj are described by the following Gaussian distribution: 

G(v ] \\.0 1 cj) = -jL=-e- ( - n ^ . (10.22) 
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We truncate the Gaussian distribution for each nuisance parameter in order to keep only 
those values that make physical sense, such as positive values. 

As per the instructions in the statistics section of the Particle Data Group review [I], we 
take the systematic uncertainties into account by convolving each nuisance parameter into 
the likelihood function, integrating over the nuisance parameters and then reproducing the 
reasoning in the previous section using the new likelihood. The likelihood is a function of 
the nuisance parameters v since it is a function of the expected number of events fi which 
is a function of the nuisance parameters, as described in Equation 110.201 The likelihood is 
given by 

L(D\f, bk, s, £ "i) = ■ (10-23) 



The new likelihood is a convolution of the old likelihood from Equation 110.231 with all 
nuisance parameters modelled by Gaussian distributions and is given by 



L(D\f,J2h,s) = J... J ,-Q G(l/ ,| LQ] aj) Y[d^. (10.24) 



Each integral is performed between a lower range, chosen such that the quantity is positive, 
and infinity. This process is called integrating out the nuisance parameters and the result is 
that the new likelihood does not depend any more on the nuisance parameters. Since such 
an integral over many parameters is very difficult to compute with normal Monte Carlo 
methods, a Markov Chain Monte Carlo (MCMC) j4] method is used in our analysis. 

Since the likelihood depends now only on /, s and bk, as if the systematic uncertainties 
did not exist, we have reduced our problem to the simpler problem solved in the previous 
section. The upper limit I is computed by yet another integration given by the generic 
formula 

J " \ \ =0-95. 10.25 

10.3 Taking All The Bins Into Account 



In the previous two sections we have described the Bayesian statistical inference of an upper 
limit assuming there is only one bin in our chosen distribution, as is the case in a simple 
counting experiment. We increase the sensitivity of the analysis if we take advantage of 
the shapes of the distributions, i.e. we consider each bin of the BNN output distribution 
separately. On a bin- by-bin basis the signal over background ratio (S/B) changes. By 
construction of our BNN output, the lower S/B is achieved for low values of BNN and 
higher S/B is achieved for high values of BNN. 
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Rate systcmatics such as described above apply the same way to all the bins. However, shape 
systematics must now be taken into account as well. A shape change is actually a change in 
values on a bin-by-bin basis. A shape systematic can be understood as a rate systematic that 
is bin-specific. For each shape systematic we introduce a new nuisance parameter modelled 
by a Gaussian distribution. We integrate all the rate and shape nuisance parameters in the 
new binned likelihood in order to obtain the analysis likelihood. The only difference with 
respect to the previous section is the integration range for the shape nuisance parameters. 
For rate systcmatics, they were the value where the parameter became positive and infinity. 
For a shape systematic, the value is computed from the templates for the shape upper 
fluctuation, lower fluctuation and central value. In our analysis we use only one shape 
systematic, namely the one due to jet energy scale for the jet energy. 

The likelihood of the i th bin is given by 

a(ai/,x>, = D a — . ( 10 - 26 ) 

where Vij represents both the rate and shape nuisance parameters for the i th bin. Since all 
bins are statistically independent, the likelihood for all the binned BNN distribution is given 
by the product of the likelihoods for each bin, namely 

L(D\f,^2b k ,s,J2^)= II £i(A|/,$>fc,*,$>;,-) ■ (10-27) 

z£bins 

We have now reduced the problem to that of a one bin distribution. We integrate out the 
nuisance parameters as in Equation 110.241 and we compute the upper limit as in Equation 
HTTT51 

10.4 Taking Into Account All Analysis Channels 

In the previous section we have presented the limit calculation for one analysis channel with 
a binned distribution. However, our analysis has six independent channels, each channel 
given by a pair of charged lepton and 6-tagging categories. In this analysis we have two 
charged lepton (TIGHT and ISOTRK) and three ^-tagging categories (SVTSVT, SVTJP05 
and SVTnoJP05). 

We first perform the analysis in each category separately, which means computing a like- 
lihood for each category. Since all channels are statistically independent, we combine all 
these categories by multiplying all these likelihoods together, which is equivalent to consid- 
ering all the bins in the discriminant output from the six channels juxtaposed in only one 
histogram. We have reduced the problem of multiple channels to the simpler problem of 
only one channel and now we proceed as in the previous section. 
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10.5 Expected Limits and Observed Limits 



Before computing the limits using the real data distribution (an unblind analysis) we evaluate 
the sensitivity of our analysis to a WH (plus ZH) signal (a blind analysis). We simulate 
the number of data events (pseudo-data events) by picking randomly according to a Poisson 
probability density function a real expected number of events in the interval generated by 
the background prediction allowed to fluctuate smoothly in its one sigma interval. Therefore, 
we do not use data, but pseudo-data. We do not use real events, but pseudo-events. Each 
pseudo-event is characterized by a different random number so that the value D is specific for 
each event. Also, in the pseudo-events we assume there is no signal at all (S=0). The median 
of the distribution of upper limits is considered the median expected limit and it characterizes 
the sensitivity of our analysis. To ensure enough pseudo-experiment statistics to compute 
the lower and upper one and two standard deviation bounds on the median expected limit, 
and yet limit the CPU power consumption, we perform 3000 pseudo-experiment^]. 

Contrary to the expected limit that uses pseudo-data in many pseudo-experiments, the ob- 
served limit (the real result of our analysis) uses real data in only one experiment. Both 
upper limits using pseudo-experiments and the real experiment use the methodology de- 
scribed in the sections above. When we present our expected and observed limits, we check 
that the observed limit is within the two standard deviation interval around the median 
expected value for each Higgs boson mass point. 

10.6 WH Neural Network Upper Limits 

At CDF a binned likelihood technique such as described in the previous sections is imple- 
mented in the MCLIMIT (90] [91] package that we used for our analysis as well. We search 
for a Higgs signal excess in our BNN neural network output distributions. Since we find 
no evidence for such an excess, we set upper limits on the WH production cross section 
times the branching ratio: a(pp — > WH) ■ BR(iJ — > bb). We present the upper limits as 
ratios (normalized) to the Standard Model predicted ones (x SM) for TIGHT, ISOTRK and 
TIGHT combined with ISOTRK, each with all 6-tagging categories combined in Table [TUTTI 
and in Figure [T0.1I 

10.7 Impact of Our Original Contribution 

Since my original contribution to the WH analysis is the addition of the ISOTRK charged 
lepton category with respect to the TIGHT charged lepton category, it is shown in Table 
I10.2l for each Higgs boson mass point the expected and observed upper limits for the TIGHT 

1 The computation of 3000 pseudo-experiments for each mass point, for each charged lepton category and 
b-tagging category pair, and for all the categories combined takes on the order of 3 days using the parallel 
computing facilities of the CDF collaboration. 
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CDF II Preliminary 5.7 fb _1 


M(H) GcV/c 2 


100 


105 


110 


115 


120 


125 


130 


135 


140 


145 


150 


TIGHT 


Exp2sP 


6.37 


7.34 


8.06 


8.84 


10.1 


13.0 


16.7 


22.5 


33.0 


50.1 


75.8 


ExplsP 


4.69 


5.13 


5.91 


6.36 


7.53 


9.59 


12.1 


16.0 


22.9 


35.2 


54.0 


Exp 


3.23 


3.6 


4.03 


4.46 


5.29 


6.69 


8.41 


11.2 


16.1 


25.1 


38.5 


ExplsM 


2.22 


2.48 


2.76 


3.15 


3.71 


4.67 


5.90 


7.92 


11.2 


17.9 


27.6 


Exp2sM 


1.58 


1.80 


1.98 


2.36 


2.73 


3.49 


4.38 


5.8 


8.27 


13.3 


20.3 


Obs 


3.00 


4.43 


5.73 


6.4 


7.43 


8.8 


10.6 


13.1 


17.7 


25.6 


40.5 


ISOTRK 


Exp2sP 


11.9 


13.0 


14.4 


16.2 


18.4 


21.8 


27.7 


36.4 


53.6 


79.2 


117 


ExplsP 


7.29 


8.17 


9.45 


10.3 


11.7 


13.8 


17.6 


23.2 


33.5 


52.0 


78.6 


Exp 


4.02 


4.53 


5.13 


5.76 


6.67 


8.13 


10.3 


13.5 


19.3 


30.7 


45.6 


ExplsM 


2.20 


2.46 


2.80 


3.18 


3.66 


4.58 


5.74 


7.72 


10.9 


17.6 


25.8 


Exp2sM 


1.44 


1.62 


1.82 


2.09 


2.45 


3.04 


3.76 


4.78 


7.18 


11.7 


16.00 


Obs 


5.19 


4.61 


6.68 


7.6 


7.34 


9.74 


12.1 


18.8 


17.5 


41.5 


71.20 


TIGHT and ISOTRK 


Exp2sP 


5.86 


6.23 


7.22 


8.87 


9.38 


11.7 


14.3 


19.0 


27.5 


41.6 


65.7 


ExplsP 


4.06 


4.59 


5.07 


5.70 


6.46 


8.16 


10.2 


13.4 


19.2 


30.0 


45.6 


Exp 


2.73 


3.04 


3.47 


3.79 


4.44 


5.62 


7.04 


9.20 


13.1 


21.1 


31.2 


ExplsM 


1.79 


2.05 


2.29 


2.66 


3.08 


3.78 


4.80 


6.48 


9.04 


14.6 


21.6 


Exp2sM 


1.27 


1.47 


1.64 


1.90 


2.18 


2.74 


3.38 


4.12 


6.37 


10.4 


13.7 


Obs 


2.39 


3.15 


4.42 


5.08 


5.48 


6.24 


7.09 


9.32 


10.4 


18.6 


31.1 



Table 10.1: Upper limits, expressed as multiples of the Standard Model prediction, for 
TIGHT charged leptons only, ISOTRK charged leptons only, TIGHT and ISOTRK charged 
lcptons combined, in all cases with all ^-tagging categories combined, as a function of the 
SM Higgs boson mass using 5.7 fb _1 at CDF. ran represents the hypothetical mass of the 
Higgs boson and is expressed in GeV/c 2 . Exp represents the expected median limit using 
3000 pseudo-experiments. Exp2sP (Exp2sM) represents the upper (lower) bound on the two 
standard deviations around Exp. ExplsP (ExplsM) represents the upper (lower) bound on 
one standard deviation around Exp. Obs represents the observed upper limit using data. 



141 



category only and the TIGHT combined with ISOTRK, as well as the percentage by which 
the limits become smaller and thus better. In Figure [TO. 21 it is shown in black the expected 
and observed limits for the TIGHT charged lepton category only and in red the ones from 
the combination of the TIGHT and ISOTRK categories. The improvement both for the 
expected limits and observed limits visible in Table ITOT^I is now also visible in a visual form. 



10.8 Summary 



In this chapter we have presented the computation methodology and results for the upper 
limit on the Higgs boson cross section times branching ratio as a function of the Higgs boson 
mass for the TIGHT, ISOTRK and TIGHT+ISOTRK cases, when all 6-tagging categories 
are combined. In the first part of the chapter we introduce the methodology of the limit 
calculation for a one-bin analysis without and with systematic uncertainties. We then pre- 
sented the methodology for a multi-bin analysis. Finally, we presented how several channels 
are combined, such as our &-tagging categories. We then introduced the pseudo-experiments, 
which use data simulation by allowing the background prediction to fluctuate within its un- 
certainty, in order to measure the sensitivity of the analysis. We then presented the real 
measurement using real data. In the last part of the chapter we presented both in plots and 
tables that adding the ISOTRK charged lepton category on the top of the TIGHT charged 
lepton category improves both the expected and observed upper limits significantly. 
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CDF Run II Preliminary 5.7 fb-1 




Observed Limit AB_TtGHTJSOTRK_ALL 5.7 scaled to 5.7 
Expected Limit AB_TIGHTJSOTRK_ALL 5.7 scaled to 5.7 



Pseudo-Exper.± 1o AB_TIGHTJSOTRK_ALL 5.7 scaled to 5.7 




Figure 10.1: Expected and observed cross-section upper limits for a WH search for the 
TIGHT (top left), ISOTRK (top right) and TIGHT+ISOTRK (bottom) charged lepton 
category and all 6-tagging categories combined at CDF using 5.7 fb _1 , as a function of the 
Higgs boson mass, between 100 GeV/c 2 and 150 GeV/c 2 . The horizontal line at 1 represents 
the Standard Model prediction. The expected upper limits are represented by the dashed 
line. The yellow (green) band represents the 1 (2) standard deviation interval around the 
expected upper limit. The observed upper limits are represented by the solid line. 
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11.2 


16.1 


25.1 


38.5 


TIGHT+ISOTRK 


2.73 


3.04 


3.47 


3.79 


4.44 


5.62 


7.04 


9.20 


13.1 


21.1 


31.2 


% Improvement 


15.5 


18.4 


16.1 


17.7 


16.1 


16.0 


16.3 


17.9 


18.6 


15.9 


19.0 


Observed limits 


TIGHT 


3.00 


4.43 


5.73 


6.40 


7.43 


8.80 


10.6 


13.1 


17.7 


25.6 


40.5 


TIGHT+ISOTRK 


2.39 


3.15 


4.42 


5.08 


5.48 


6.24 


7.09 


9.32 


10.4 


18.6 


31.1 



Table 10.2: Expected and observed cross-section upper limits for TIGHT and 
TIGHT+ISOTRK analysis, as well as percentage improvement in the expected limit when 
ISOTRK is combined with TIGHT. 




Figure 10.2: Expected and observed cross-section upper limits for a WH search overlaid 
for the TIGHT category and TIGHT combined with ISOTRK category and all ^-tagging 
categories combined at CDF using 5.7 fb _1 , as a function of the Higgs boson mass, between 
100GeV/c 2 and 150GeV/c 2 . The horizontal line at 1 represents the Standard Model pre- 
diction. The expected upper limits are represented by the dashed lines in black (TIGHT) 
and red (TIGHT+ISOTRK). The yellow (green) band represents the 1 (2) standard devi- 
ation interval around the expected upper limit for TIGHT. The observed upper limits are 
represented by the solid lines in black (TIGHT) and red (TIGHT+ISOTRK). 
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Chapter 11 

Conclusions and Discussions 



11.1 Summary 



In this dissertation we have presented an experimental test of the current theory of particle 
physics and their interactions, the Standard Model (SM). All the predictions of the SM have 
been observed experimentally, except one, namely the existence of a new elementary particle 
called the Higgs boson. If the particle is discovered, the SM is confirmed experimentally. If 
the particle is excluded, then the SM is refuted, which means that the true model of nature 
is not the SM, but some other theory beyond the SM. The mass of the SM Higgs boson is 
unconstrained by the theory, but direct and indirect experimental searches constrain it at 
95% confidence level between 114.4 and 158 or between 175 and 185 GeV/c 2 . The preferred 
Higgs boson mass from SM indirect fits is towards the lower edge of the allowed mass ranges. 
We therefore performed an experimental search of the Higgs boson that was most sensitive 
to possible low Higgs boson masses. 

We study a sample of pp collisions at the Tevatron at the centre-of-mass energy ^/s = 
1.96 TeV that corresponds to an integrated luminosity of 5.7 fb^ 1 collected by the Collider 
Detector at Fermilab. There are many ways a Higgs boson is hypothetically produced at 
the Tevatron, but independently of the production mode, once produced, a Higgs boson 
would decay the same way, depending just on its mass. For masses below 135 GeV/c 2 , the 
Higgs boson is expected to decay predominantly to a bb pair, whereas for higher mass it 
decays predominantly to a W + W~ pair. Given our preference for a low mass Higgs boson, 
we choose the most promising channel to identify a Higgs boson that decays to bb pairs: 
the associated production between the Higgs boson and the W boson, where the W boson 
decays leptonically. Our analysis channel is therefore WH — > Ivbb. 

We select events consistent with a signature of a high-pr charged lepton (electron or muon) 
candidate, large and exactly two jets. In order to improve the signal over background 
ratio, we require that at least one of the jets is identified to originate in a b quark. We 
use a sample of events without this requirement as a Pretag control sample. In order to 
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discriminate further between signal and background events, we employ an artificial neural 
network. Finally, using a Bayesian statistical inference technique, we compute expected and 
upper limits on the cross section times branching ratio with respect to the SM prediction 
for Higgs masses between 100 and 150 GeV/c 2 . 

The main charged lepton categories at CDF are electron and muon candidates with stringent 
reconstruction criteria. An electron (muon) candidate is typically a high-p^ isolated track 
that is matched to a calorimeter cluster (muon stub). We reconstruct tight electron (muon) 
candidates using an electron(muon)-inclusive trigger. We add together all the WH events 
selected using tight charged leptons into the TIGHT sample. 

Our detector has uninstrumented regions both at the calorimeter level, such as the small 
space between calorimeter towers, and at the muon detector level, where the eta-phi coverage 
is not uniform. We introduce a novel chargcd-lepton category with looser reconstruction 
criteria, namely a high-px isolated track that is freed from the requirement to match a 
calorimeter cluster or a muon stub. Such charged-lepton candidates recover also real charged 
leptons that would have been otherwise lost in the non-instrumented regions of the detector. 
We call the WH sample collected in this way the ISOTRK sample. We make sure that the 
ISOTRK and TIGHT samples are orthogonal. 

As there is no ISOTRK-dedicated trigger at CDF, we use triggers that make use of the 
orthogonal information in the event, namely the fx and the jets. We have three such 
MET-based triggers at CDF. We parameterized at each of the three trigger levels the trig- 
ger efficiency turnon curves as a function of trigger fx and identified the appropriate jet 
kinematic selection so that the efficiency is flat with respect to jets and only varies with 
respect to trigger We also measured the prescale of one of the triggers that is prescaled. 
Since not all triggers were used for all the runs in our dataset, we measured the fraction of 
luminosity where each of the possible combinations of the triggers were used. 

There are many possible ways to combine the triggers and indeed in the WH search our 
contribution on this topic was a work in progress. As such, the WH analysis using 2.7 
fb _1 from the summer of 2008 used only one MET-based trigger. The WH analyses using 
4.3 fb _1 from the summer of 2009 and using 5.7fb -1 from the summer of 2010 used two 
different MET-based triggers. The ISOTRK channel was an original contribution to these 
analyses. These results went into the CDF and Tevatron combinations from those years and 
were presented at the major summer conferences. 

In this latest analysis we use the same integrated luminosity as the WH analysis of the 
summer of 2010. The motivation is that we focused on two main issues. The WH group 
at CDF of which I am an active member decided to develop a new data analysis software 
framework called WHAM, completely independent of the one used for the 2010 analysis. 
This new framework is more modular, more flexible and dedicated to the single charged 
lepton plus missing transverse energy plus jets. Several analyses such as WH, WZ, single 
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top, technicolor, tiH are currently produced in this framework. WHAM allows the sharing of 
almost all the tools between the analyses and as such avoids redundancy, allows an analysis 
improvement to be instantly propagated to the other related analyses, and allows greater 
scrutiny of a common piece of code and thus identify bugs easier. I am one of the three 
main authors of the software framework. My first task was therefore to reproduce in the new 
framework the 2010 WH analysis in the TIGHT and ISOTRK category, while my second 
task was to improve the MET-plus-jcts parameterization in order to improve the ISOTRK 
category, the results of which are embodied in this dissertation. 

Our final result is represented by the expected and observed 95% CL upper limits on the 
Higgs boson cross section times branching ratio with respect to the SM when all categories 
are combined. The expected upper limits vary between 2.73 x SM and 31.2 x SM for 
a mass range of 100-150 GeV/c 2 . The improvement in sensitivity due to addition of the 
ISOTRK charged lepton category is between 16 and 19% for the entire mass range. The 
observed upper limits vary between 2.39 x SM and 31.1 x SM for the entire mass range. 
Since the upper limit set by the LEP experiments is 114.4 GeV/c 2 , it is interesting to note 
that for a 115 GeV/c 2 Higgs boson, we compute an expected upper limit of 3.79 x SM and 
an observed upper limit of 5.08 x SM. 



11.2 Future Prospects 



This section presents the future prospects and potential improvements of the WH search at 
CDF, of the Higgs combined searches at the Tevatron and of the original method to combine 
an unlimited numbers of MET-plus-jets triggers that I introduce in this analysis. 

11.2.1 WH search at CDF 



By the time this thesis is submitted, there are about two more months remaining until 
the summer conferences of 2011. I will lead the WH effort through the internal review 
process and a possible subsequent publication for approximately 7 months thanks to a 
Universities Research Association^ Visiting Scholar Postdoctoral Fellowship and a grant 
for travel to Fermilab. It is both the goal of the CDF experiment and my personal goal 
to improve this analysis with the newly available datasets and as many of the following 
potential improvements in the analysis technique: add a forward electron charged lepton 
category; migrate some of the current ISOTRK events in a separate loose muon category 
and use triggers dedicated to them; replace the current 6-tagging algorithms with a newer 
one that has been produced by the CDF collaboration. These improvements are expected to 
sharpen the analysis sensitivity significantly more than just the addition of more integrated 
luminosity would allow us to, as seen in the following section. 

Universities Research Association (URA) is the association of universities that manages Fermilab. 
McGill University and the University of Toronto are the only universities in Canada that are members 
of the URA. 
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11.2.2 Higgs search at the Tevatron 



The cross section times branching ratio sensitivity of Higgs searches at the Tevatron typically 
improve continuously due to two reasons: using larger integrated luminosity data sets, as 
Tevatron accelerator performs excellently and is scheduled to run until 30 September 2011; 
improving the analysis techniques. If analysis improvements are ignored, the sensitivity 
scales as l/y' J Cdt. Over the past few years, CDF has managed to improve always the 
expected sensitivity more than just could be achieved by using larger data sets, as can be 
seen in Figure fTl.il for a Higgs boson mass of 115GeV/c 2 (top) and 160GeV/c 2 (bottom), 
where twice larger datasets than CDF only are assumed, simulating at first order an expected 
Tevatron combination between the CDF and DZero experiments, that have almost similar 
data sets and analysis sensitivity. Also, Figure IT 1 . II suggests that with about 9 fb _1 of 
integrated luminosity collected until the expected end of Run-II at the Tevatron in summer 
2011, if the potential analysis improvements identified are implemented, then the Tevatron 
combination would approach a Standard Model sensitivity at both the low and high Higgs 
boson mass. 

11.2.3 Trigger Combination Method 

Our original method to combine triggers is already in use by other CDF analyses due to the 
fact that the method is implemented by a user friendly software package called ABCDF that 
we designed. The W Z — > Ivbb and ZH — > vvbb will use the three triggers combined by the 
novel method in the searches they prepare for the summer of 2011. The method has potential 
applications in searches of physics beyond the Standard Model, such as supersymmetry, that 
have as key signatures large missing transverse energy and jets. As such, it is already applied 
at other CDF analyses and can be applied to other analyses at DZero at the Tevatron or at 
ATLAS and CMS at the Large Hadron Collider at CERN. The method has the advantage 
that it incorporates an unlimited number of triggers and each trigger is allowed to have its 
own jet selection, prescale and interval of applicability. 

11.2.4 Higgs searches at the LHC 

Both ATLAS and CMS experiments at the Large Hadron Collider have started to present 
results on SM Higgs searches. As of April 2011, none of these searches has become more 
sensitive than those of CDF and DZero at the Tevatron. However, the LHC has broken the 
instantaneous luminosity record of the Tevatron and as more data is rapidly collected by 
the well-performing ATLAS and CMS detectors, they will surpass in sensitivity finally the 
Tevatron searches. For now, the jury is still out and the friendly competition between the 
Tevatron and LHC is ongoing. Being in experimental particle physics is indeed living in 
interesting times. 
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Figure 11.1: Projected expected median sensitivity scaling with total integrated luminosity 
used by CDF, assuming a combination of two CDF experiments, as of July 2010. The 
solid lines represent the sensitivity projections as a function of integrated luminosity with 
analysis improvement, while the dots represent the actual limits set for a Higgs boson mass 
of 115GeV/c 2 (top) and 160GeV/c 2 (bottom). The brown band represents the addition 
of analysis improvements that are expected to be added to the CDF analyses in the future 
and that could allow to reach Standard Model sensitivity with 9 fb -1 that is expected to be 
collected by the end of Run-II of the Tevatron accelerator in the summer of 2011. Credit 
image to the CDF collaboration. 



11.3 Conclusions 



We have presented a WH search at CDF and we introduced a novel charged lepton category 
that improved the sensitivity of the search by 16-19% across a Higgs boson mass of 100-150 
GeV/c 2 . In the process we developed a novel method to combine an unlimited number 
of MET-plus-jets triggers, which is already being used by other CDF analysis and has the 
potential to be useful for other experiments as well, since triggers based on this signature 
are key to some physics beyond the Standard Model scenarios. 
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Appendix A 



MET-based Trigger 
Parametrization 



Events that do not present a tight charged lepton, but have a high-p^ track isolated from 
other activity in the tracking system are called isolated track (ISOTRK) events. This new 
charged lepton category has looser reconstruction criteria than the tight charged leptons 
and represents my original contribution to the WH analysis. At CDF the tight charged 
lepton events are collected using dedicated inclusive electron or muon triggers. However, 
there is no trigger dedicated to ISOTRK events. For this reason, we use triggers based 
on the orthogonal information to the charged lepton, namely the missing transverse energy 
(MET) and jets. We call them generically MET-based triggers. At CDF we have three such 
triggers. We denote them with MET2J, MET45 and METDI. For Monte Carlo simulated 
events we have to model the trigger selection by applying on an event-by-event basis a weight 
that represents the trigger efficiency. This chapter describes the parametrization that we 
measured for these MET-based triggers used in the analysis for the ISOTRK category. The 
novel method we introduced to combine these three MET-based triggers is described in 
Appendix [B] 

A.l Three MET-based Triggers at CDF 

There are three trigger levels at CDF, which we denote LI, L2 and L3. At each trigger level, 
quantities are reconstructed more correctly, using successively greater computing resources, 
than at the previous trigger level. As a general rule, trigger requirements become more 
stringent as the trigger level is higher. The offline event selection is then even tighter. 
In this section we will describe the trigger requirements for each of the three MET-based 
triggers employed in this analysis. 
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A. 1.1 MET + 2 Jets Trigger 



The MET + 2 jets trigger (MET2J) has been active since the beginning of Run II at 
the Tcvatron. A data event fires the LI of MET2J if it has a MET larger than 28 GeV 
{$t > 28 GeV), in which case it is sent automatically to be studied by L2. In order to pass 
the L2 requirements, a data event must have > 30 GeV and at least two jets, one of 
them with a transverse energy Et > 20 GeV and reconstructed in the central region of the 
detector (\rj\ < 1.1) and the other jet with Et > 15 GeV and \r]\ < 2.0. Not all events that 
pass requirements at L2 are sent to be analyzed by the L3, which means that this trigger 
is prescaled. The prescalc is done in an automatic way as a function of the instantaneous 
luminosity. We measure the prescale of this trigger, as seen in Section IA.61 Events that 
reach L3 and meet the requirement of > 35 GeV fire the full MET2J trigger and are 
saved on tape to be used in our analysis. Since there are real data events that do not meet 
these trigger criteria, they are not stored and therefore not used in the analysis. For this 
reason, not all Monte Carlo simulated events should be used, or rather they should all be 
used to preserve the statistics, but they should be weighted to simulate the trigger selection. 

In time, there were four major versions of the MET2J trigger used at CDF. The trigger 
evolved as the instantaneous luminosity increased and as the requirements of the specific 
physics groups changed. For example, previous versions required a ]^t > 25 GeV at LI, 
did not require that one of the jets be central at L2, and there was no prescale at L2. For 
a given run, only one version of the trigger was used. In this analysis we parameterize 
the trigger efficiency averaged out over the several historical versions, as if there were only 
one version. We make sure our offline requirements are tighter than the most recent and 
stringent requirements of the MET2J trigger. 

A. 1.2 MET Trigger 

The MET-only trigger has requirements only on fx, but not on jets. This trigger has also 
been in existence since the beginning of Run II. It comes in two historical versions. The 
first version was used for the first 2.3 fb _1 of the integrated luminosity and required at 
LI $ T > 25 GeV, at L2 $ T > 25 GeV and fl T > 45 GeV. The second version started 
being used afterwards for the remaining of 3.4 fb _1 of the integrated luminosity used in this 
analysis. The physics desire was to decrease the^T value in order to select more events from 
rare processes, such as Higgs or physics beyond the Standard Model. As such, it is required 
at LI that $ T > 28 GeV, at L2 $ T > 35 GeV and at L3 $ T > 40 GeV. Just as in the case 
of the MET2J trigger, we parametrize the trigger as if it has only one version which we call 
MET45. This trigger was never prescaled. 

One caveat is that somewhere in the early data taking there was a bug in the MET45 trigger 
and although the event fired and the information was stored, the information is not to be 
trusted. Therefore, in the run range 178637-192363, which approximates 3% of the total 
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integrated luminosity used in this analysis, we treat data events as if the trigger MET45 
was not defined and therefore could not have fired. We have to simulate this in Monte Carlo 
events as well. The novel method we introduce to combine triggers takes this into account 
easily, as seen in Appendix [B] 

A. 1.3 MET + Dijet Trigger 

The third and last MET-based trigger at CDF is the MET + dijet trigger. As its name 
suggests, it is very similar to the MET2J trigger and in order to avoid confusion it is denoted 
in this thesis as METDI. This trigger was first introduced when about 2.4 fb _1 of integrated 
luminosity have already been collected and has never been changed since. This trigger was 
designed by the Higgs Trigger Task Force and was optimized for the Higgs boson search. 
This trigger was never prescaled. Since this trigger was applied only in about 42% of the 
integrated luminosity, we also have to simulate that in Monte Carlo events. The novel 
method we implemented does that easily, as described in Appendix [B] 

A. 2 Variable Choice for Trigger Parametrization 

We want to parametrize the trigger efficiency turnon curves as a function of only one variable 
that is common for all the three triggers and apply cuts so that we are in the plateau regions 
with respect to all the other variables. For the MET-based triggers, the naturally quantity 
for the parametrization is the missing transverse energy and the trigger specific selections 
are based on the kinematic distributions of the two jets in the event. 

Since the parametrization is performed using an offline data sample and since we also apply 
the parametrization in the analysis online, we need to choose one fa quantity computed 
offline that is as close as possible to the^r quantities used at trigger level. As discussed in 
Section 14.51 the fully corrected $t on which we apply a cut at 20 GeV for all events in our 
analysis is raw fx corrected for the position of the primary interaction vertex, for the jet 
energies in the event and for the energy deposited by the muon in the calorimeter (which 
is relevant in the case of ISOTRK charged lepton events, which are muons in 85% of the 
time) . From a physics point of view, I^t represents the missing transverse energy due to the 
neutrino in the final state. However, these corrections are not performed at trigger level and 
therefore the physical meaning of the trigger fx is the missing transverse energy of the W 
boson (and not of the neutrino!). 

Ideally wc should choose raw fx for our parametrization. However, studies have shown that 
this variable is not well modelled in the control sample (Pretag) of the analysis. Therefore 
we correct this quantity for the position of the primary interaction vertex and the energy of 
the jets, but not the energy of the muon. Its physics meaning remains the missing transverse 
energy of the W boson, but it is now modelled better. We denote this quantity trigMET 
and we use it for the trigger turnon curve efficiency parametrization. 
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A. 3 Trigger-Specific Jet Selection 



The next step is to identify for each trigger the specific jet cuts that allow for the remaining 
data events that the turnon curve paramctrization is indeed flat in any jet quantities and 
depends only on trigMcET. All events in our analysis must have exactly two jets with 
E T > 20 GeV and \r)\ < 2.0. 

The jet selection specific for the MET2J trigger is the following: both jets need to have 
Et > 25 GeV, one of them must be in the central region of the detector (|?y| < 0.9), 
while the second jet must have \r/\ < 2.0, and also the two jets have to specially separated 
(Ai? > 1.0). 

The MET45 trigger does not have any jet requirements at trigger level. Therefore, for 
this trigger the jet selection is the same as for the non-ISOTRK charged lepton categories, 
namely exactly two jets with Et > 20 GeV and |?7| < 2.0. 

For the METDI trigger we studied that the optimum specific jet selection requires the most 
energetic jet to have Et > 40 GeV, the second most energetic jet to have Et > 25 GeV 
and both jets to have |?7| < 2.0. 

We can already see that each trigger must be applied only in its specific jet kinematic region, 
which is equivalent to assuming the trigger is not defined in the other kinematic regions. 
The method we introduce in Chapter IB1 also takes this easily into account. 

A. 4 Parametrization for MET-based Triggers 

Since we use these MET-based triggers for the ISOTRK charged lepton category and pre- 
vious studies have shown that ISOTRK candidates are in 85% of cases muon candidates, 
we measure the trigger turnon curves using a data sample collected with a muon inclusive 
trigger. We require exactly one reconstructed CMUP muon candidate which fires the CMUP 
trigger. We compute trigger efficiency turnon curves for each of the three MET-based trig- 
gers and for each of the three trigger levels in such a way that none of these turnon curves 
takes into account the prescale of the trigger, if any. In the following paragraph we describe 
the procedure for one generic case. 

We select the subset of events that pass the jet selections specific for this trigger. For these 
events we fill a histogram for the variable trigMET. This is the denominator histogram. We 
then fill another histogram for the same variable, but only for the events that also fired the 
chosen MET-based trigger. This is the numerator histogram. Since the CMUP and the 
MET-based trigger arc uncorrelatcd for all practical purposes, we divide the numerator and 
denominator histograms to obtain the efficiency turnon histogram for the chosen trigger. 
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We fit the efficiency histogram to a sigmoid function with four parameters and as a function 
of trigMET, given by 

Eff(trigMET) = c 3 + ^ jJL-c, , (A.l) 

1 + e 

where cq represents the highest plateau efficiency, c\ represents the central value of the 
turnon region (and is measured in GeV) , C2 represents the width of the turnon region (and 
is also measured in GeV) and C3 represents the lowest efficiency value. The fit returns the 
four parameters which uniquely defines the efficiency as a function of trigMET only for a 
given trigger and trigger level. 

In order to ensure that the turnon curves do not include the effect of the eventual trigger 
prescale (the method is general, although we know that only the trigger MET2 J is prescalcd 
only at L2), for L2 we include in the denominator the requirement that the event was sent 
from LI to L2 and for L3 that the event was sent from L2 to L3. 

The nine turnon curves are presented in the Figure DOl (MET2J trigger), the Figure ETSl 
(MET45 trigger), and the Figure \KM (METDI trigger). 

A. 5 Parametrization for Systematic Uncertainty Eval- 
uation 

We repeat the procedure above in bins of the following kinematic quantities: Et, f] and <fi 
of both jets, absolute value of the A77, A</>, AR, AEt, as well as fully corrected^ of the 
analysis and the fraction of total luminosity that corresponds to each run. The number of 
bins are chosen automatically by the code in order to have a minimum specified number of 
events in the turnon region of the distribution so that the fit is performed correctly. 

Whereas the standard trigger weight is the value of the central turnon curve for the event 
trigMET, the systematic weight that corresponds to the variable 77 is given by the same 
trigMET applied to a different turnon curve specific to the bin of the particular event 
77. The same is repeated for all systematic uncertainties. The total weighted average is 
compared between the central turnon curve and each of the systematic values and the 
largest percentage difference is quoted as the systematic uncertainty of the analysis. This 
procedure is general and works to all trigger combination methods, including the one we 
introduce in this analysis and we present in the next chapter. 

A. 6 Prescales for MET-based Triggers 

Since the MET45 and METDI triggers are unprescaled, their prescales are 1.000 ± 0.000. 
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Figure A.l: MET2J trigger turn-on curves, parameterized as a function of trigMET. The 
figures show, from top to bottom, the LI, L2 and L3 turn-on curves, respectively. The 
turn-on curves were measured in the full dataset used in this analysis, and do not include 
the effect of the prescale for MET2J trigger. 
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Figure A. 2: MET45 trigger turn-on curves, parameterized as a function of trigMMET. The 
figures show, from top to bottom, the LI, L2 and L3 turn-on curves, respectively. The 
turn-on curves were measured in the full dataset used in this analysis. 
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Figure A. 3: METDI trigger turn-on curves, parameterized as a function of trigMMET. The 
figures show, from top to bottom, the LI, L2 and L3 turn-on curves, respectively. The 
turn-on curves were measured in the full dataset used in this analysis. 
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However, the trigger MET2J is prescaled at L2 for a large fraction of the integrated lu- 
minosity. We computed the total integrated luminosity for MET2J and we divided it to 
the total integrated of another trigger (which requires a jet with an energy larger than 100 
GeV) which has also been continuously active since the beginning of Run II and was never 
prescaled. Therefore we computed the prescale for MET2J to be 0.92 ± 0.05. 

A. 7 Integrated Luminosities for MET-based Triggers 

Not all triggers were defined at the same time. In order to properly simulate that in Monte 
Carlo events, we measured the fraction of integrated luminosity for each combination of 
MET-based triggers were defined, as shown in Table PO] 



MET2J 


MET45 


METDI 


Fraction 


No 


No 


No 





No 


No 


Yes 





No 


Yes 


No 


0.13% 


No 


Yes 


Yes 





Yes 


No 


No 


2.53% 


Yes 


No 


Yes 





Yes 


Yes 


No 


36.52% 


Yes 


Yes 


Yes 


60.81% 



Table A.l: Fraction of integrated luminosity where MET-based triggers are defined for each 
possible combination of the MET-based triggers. 



We have now all the information needed about triggers in order to combine them in our 
analysis. We introduce a novel trigger combination technique that we present in the next 
chapter. 
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Appendix B 

Novel Method to Combine 
Triggers 

Once the MET-based triggers are parametrized, there are many ways they can be combined 
in an analysis. Our trigger parameterizations were used for the WH analyses for the sum- 
mers of 2008, 2009 and 2010, which were presented in two Ph.D. theses, CDF and Tevatron 
Higgs combinations, one PRL paper and two PRD drafts under preparation, as presented 
in detail in the section entitled " Original Contributions" . 

B.l The Method for the 2.7 fb 1 WH search 

The analysis for the summer of 2008 used 2.7 fb -1 of integrated luminosity. It used only 
one MET-based trigger, namely MET2J, and therefore there were no complications. For 
both Monte-Carlo-simulated and data events only the same trigger would be used. For data 
events, the event is checked if it has fired the trigger. If it has, the event is kept and is 
given a weight of exactly 1.0. If not, the event is rejected, which is equivalent to receiving 
a weight of exactly 0.0. 

As a side note, this analysis used a simplified and less precise trigger parameterization than 
the one we described in Appendix [3] The analysis used only one trigger efficiency turnon 
curve measured across all trigger levels and which included the trigger prescale. The trigger 
weight was applied to all Monte-Carlo-simulated events without exception. 

B.2 The Method for the 4.3 and 5.7 fb 1 WH searches 

The analysis for the summer of 2009 (2010) used 4.3 (5.7) fb -1 of integrated luminosity. 
Both analyses used two MET-based triggers, namely MET2J and MET45. Since MET2J 
could be applied only for a subset of the jet kinematic phase space where MET45 could 
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be applied and since efficiency studies suggest that MET2J is more efficient than MET45, 
it was decided that for events with two jets with Et > 25 GeV, one jet central with 
|r/| < 0.9 and the other jet with |7y| < 2.0, and with non overlapping jets with AR > 1.0, 
only the MET2J would be used, whereas for the remaining phase space up to two jets with 
E T > 20 GcV and \rj\ < 2.0 only MET45 would be used. This method is very clean and 
brings no complications, since for a kinematic region, both the Monte-Carlo-simulated and 
data events will use the same trigger. 

For a given event, the jet kinematic region was checked. If the event was in the tight 
kinematic region and it was a data event, then the event was checked if it fired the MET2J 
trigger. If it did, the event was kept. Otherwise, it was rejected. If the event was simulated, 
then a weight given by the multiplication of the weights given by the LI, L2, L3 and turnon 
curves for MET2J and by the prescale of MET2J was used for that event. If the event 
was not in the tight jet kinematic region, but it was in the looser kinematic region, then 
the procedure described above was done for the MET45 trigger, with the caveat that if the 
data event was in the particular run range where the MET45 trigger had a bug, then the 
event was rejected even if the trigger fired. But how to model that correctly in Monte Carlo 
simulation? Since the effect was small, on the order of 2.6% (Table PO]) , the effect was 
ignored and included in the systematic uncertainty of the procedure. 

The method described in the previous section does not use a local "OR" between the MET2 J 
and MET45 triggers in order to avoid the correlations between the triggers. The price to 
pay is a smaller event acceptance. The advantage is that systematic uncertainty is easier to 
calculate correctly and is smaller than in the case of correlated triggers. The main feature 
of the method described in the previous section is that in the kinematic phase space of jet 
selection, a trigger is assigned to one region and another trigger is assigned to another region. 
We stress that this is done before the data events are checked if the trigger fire. The choice 
of kinematic regions and the chosen trigger for each kinematic region is based on an a priori 
study. In the case of the particular method described in the previous section, the turnon 
curves for the MET2J (Figure IA.1[) and the MET45 (Figure IA.1|) triggers were compared 
and it was observed that the MET2J trigger has a turnon region on smaller trigMET values 
than MET45 and therefore has a potential larger efficiency than MET45. In the kinematic 
region that was not tight, but it was loose, only MET45 trigger could be used, so there 
was no dilemma. For the tight jet region, though, since correlation between triggers was to 
be avoided, only one trigger had to be chosen and based on the reasoning above this was 
MET2J. 

This procedure increased the signal acceptance over MET2J only, but it was clear that does 
not provide the maximum signal acceptance. For example, for some events in the tight jet 
region it is possible that the trigger weight is larger for MET45 than for MET2J, or that the 
event was not selected by the MET2J stream due to a prescaling or other effect, but was 
selected by the MET45 trigger. Why not divide this kinematic region further into smaller 
kinematic regions and for each of them reevaluate if to use the MET2J or the MET45 
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triggers? But how should this division be done? And how to take into account the fact 
that for 2.6% of data events the trigger MET45 was not defined? And what about using 
also the METDI trigger, which is defined only for about 60.8% of the integrated luminosity 
(Table [A.l[l ? It seems there would be very many kinematic regions and there would be a 
very complicated bookkeeping. 

In the next section we propose a general method that solves all these problems for any 
number of triggers combining. We apply this method to optimize the signal acceptance for 
our WH analysis. 

B.3 The Novel Method for Combining Triggers 

The solution we propose in order to maximize the event selection while using only one trigger 
per kinematic region with minimal bookkeeping is that of considering the largest possible 
number of kinematic regions, i.e. the number of events in that particular Monte-Carlo- 
simulated or data sample. Our idea is to consider each event as an independent kinematic 
region. Just as in the case of the method above, we study a priori which is the most efficient 
trigger in the kinematic region, choose that trigger and ignore the other triggers. Since the 
kinematic region means that unique event, it means we choose the trigger that has a priory 
the largest probability to fire that event. Below we will go into the details of how such a 
probability is computed for every trigger for the particular event. Once they are computed, 
though, we choose the trigger as the one that has the largest a priory probability. If all 
probabilities are strictly zero, then the event is rejected both if it is a data event or a MC 
event. This is equivalent to a study done for a particular kinematic region and choosing 
that all events in that kinematic region are assigned to only one trigger, such as MET2J 
was preferred to MET45 in the tight jet region of the previous method. For a data event, 
the chosen trigger is checked if it fired the event. If it has, the event is kept, or equivalcntly 
is assigned a weight of strictly 1.0. If the event did not fire the trigger, the event is thrown 
away, or equivalent ly is assigned a weight of strictly 0.0, and the other triggers are not 
checked at all. Ignoring the other triggers is the main feature that allows the orthogonality 
between triggers. It is crucial that the trigger is chosen before the trigger is checked if it 
fired. For a Monte Carlo event it is assumed automatically that the trigger has fired the 
event and the probability of the chosen trigger is returned as the trigger weight. 

The a priori probability that an event fires a particular trigger is given by the product 
between the weight at LI for that particular trigger, the weight at L2 for that particular 
trigger, the weight at L3 for that particular trigger, the prescale for that particular trigger. 

P = w L1 -w L2 -w L3 -PS- JS-TD , (B.l) 

where u>li, wl2, WL3 are weights of that particular trigger at LI, L2 and L3, respectively, 
are given by Formula IA.11 and vary as a function of the event trigMET; PS represents 
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the average prescale of the evaluated trigger, with the same value for all events, given by 
Formula IB. 2\ JS represents the jet selection and is 1 if the jet selection for that particular 
trigger is met by the event and zero if it is not, given by Formula IB. 41 TD represents the 
condition if the chosen trigger is defined for the event and is 1 if this happens and if it 
does not. 

fo.92, if MET2J 
PS=\im, if MET45 (B.2) 
1.00, if METDI 

1.00, if chosen trigger jet selection is passed 
JS = ^ (-B-3) 
0.00, if chosen trigger jet selection is failed 

.1.00, if chosen trigger is defined 
TD= { (B.4) 

0.00, if chosen trigger is not defined 

For data events, TD is evaluated easily as each event contains the information if the trigger 
is defined or not. For MET45 we also consider the trigger is not defined for the 2.6% of 
the integrated luminosity where the trigger has a bug. As seen in Table [A~T1 for 0.13% 
of the integrated luminosity no MET-based trigger is defined, for 2.53% of the integrated 
luminosity only MET2J is defined, for 36.52% of the integrated luminosity only MET2J 
and MET45 arc defined and for 60.81% of the integrated luminosity all three MET-based 
triggers are defined. The simulation of this in Monte Carlo was the really tricky part with 
the previous methods. 



In our approach, for every Monte Carlo simulated event a random number is chosen from 
a uniform distribution between and 1. It represents the probability that the event falls 
in any of the integrated luminosity intervals based on which it is decided which triggers are 
defined or not for the MC event. If the random number is in the interval [0.0000-0.0013] 
then all triggers are assumed to be undefined and TD=0.0 for all triggers. For the interval 
[0.0013-0.0266] TD=1.0 for MET2J and TD=0.0 for MET45 and METDI. For the interval 
[0.0266-0.3918] TD=1.0 for MET2J and MET45 and TD=0.0 for METDI. For the interval 
[0.3918,1.000] TD=1.0 for all three MET-based triggers. 

Formula IB. II ensures that each trigger is allowed in the competition with the other triggers 
to decide which is more likely to fire for the particular event only if the trigger is meaningful 
for that particular event, i.e. if the event passes the jet selection specific and necessary for 
the trigger to be considered and if the trigger was actually defined for that particular run 
to which the event belongs. If this does not happen, the trigger ends up with a Probability 
of zero, which makes sure the trigger does not win over the other triggers. If the probability 
is different than zero, than it is the probability given by the turnon curves at each of the 
three trigger values as well as the prescale of that trigger. 
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We implemented this method into a software package called ABCDFg that allows for a 
user-friendly inclusion of the method into an analysis. The ABCDF package will take in 
the event information and return a probability between and 1 for Monte Carlo simulated 
events and either or 1 for data events. A selection cut of probability strictly larger than 
0.001 is necessary to make sure the data events for which the chosen trigger did not fire 
are rejected from the analysis, as they would have a weight of strictly 0.0. ABCDF is part 
of the CDF software repository and can be used by any analysis. In fact, it is being used 
already by several analysis at CDF for the three MET-based triggers. 

The method is a general one and can be used with any number of MET-based triggers. 
Each trigger would come with its own specific kinematic selection. Information other than 
jet information can be included in the kinematic selection. This method was used in the 
analysis presented in this dissertation and is currently being used by other analyses at CDF. 
It also has potential applications at other experiments, as MET+jet triggers are used for 
new physics searches such as supersymmetry at the LHC experiments. The method can be 
used on other trigger types and even the parametrization could be done as a function of 
another variable for each different trigger, i.e. wlj could be computed by different formulae 
specific for each trigger. 

The total Monte Carlo event count is the count of events that pass the event selection 
weighted by the total Probability of the chosen trigger, i.e. the largest Probability amongst 
the available triggers. The systematic uncertainty is calculated easily by considering other 
turnon curves in bins of several kinematic quantities enumerated in Section IA.5I which 
changes on an event by event basis the wl values and may change not only the Proba- 
bility of each trigger, but also the chosen trigger. The largest percentage difference for all 
systematic variations with respect to the standard event count is typically considered to 
constitute a sensible systematic uncertainty. 

An assumption of our method is that the triggers are very efficient. The assumption is a 
very good one in the case of our analysis. However, if the triggers were very inefficient, 
an OR method would provide a significantly higher even yield. One would have to model 
correctly the correlation between triggers and compute the systematic uncertainty for the 
OR trigger combination. 



lr The name of the package comprises the author's initials followed by those of the Collider Detector at 
Fermilab experiment 
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Appendix C 

WHAM package in CDF 



C.l Introduction 

Associated WH production search Analysis Modules (WHAM) is a new data analysis frame- 
work for the CDF collaboration. It builds on a previous framework developed by the top 
quark study group inside CDF and adds functionality through its improved modular struc- 
ture. WHAM performs all the analysis stages from data and Monte Carlo ntuples produced 
by the CDF production group up to the final measurement, such as limits or cross sections. I 
am one of the main contributors to WHAM code development and validation. Also, WHAM 
has been used to produce the results presented in this thesis. 

C.2 Motivation 

The main motivation of WHAM is to perform a combination analysis between two WH 
searches inside CDF: the one using as a discriminant an artificial neural network (WHNN, 
the search presented in this thesis) and the one using a boosted decision tree and matrix 
elements (WHME). Studies inside CDF has shown that there is no 100% correlations between 
the two discriminants and therefore more information could be extracted by combining the 
two searches. This can be achieved if a superdiscriminant is built that takes as inputs 
the event by event basis values for discriminants of each analyses. CDF latest combination 
between WHNN and WHME showed a 10% increase in sensitivity up to the best performing 
analysis [2]. 

The key words here are " on an event by event basis" . That means we need to make sure both 
analysis have the exact event selection. Also we have to make sure both analyses reconstruct 
the same way various kinematic quantities such as jet energies, missing transverse energies, 
dijet invariant mass and have the same values for these on an event by event basis. So far 
each analysis used its own framework, its own definition of loose charged leptons and its own 
way for corrected various kinematic quantities for various effects. This makes a combination 
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between WHNN and WHME very time and resource consuming. The combination cited 
above is the only one achieved so far. 

The goal of WHAM is to allow an easy combination of these two analyses at any desired 
moment. This will allow CDF collaboration to present combined and therefore improved 
results for the WH search at each conference cycle. 

But that is not all. WHAM allows very easily to perform searches that have the same or 
very close signature as the WH one: Technicolour and tiH searches, as well as WZ and 
single top measurements. So far these analyses have had their own framework. 

The fact that all these analyses could be done in a common framework allows that a given 
improvement by one collaborator once integrated in WHAM and validated for one analysis 
can be used automatically and thus help improve all the other analyses as well. 

One last point is that WHAM docs all the steps from original data and Monte Carlo simula- 
tions up to the final result with a minimal number of actions from a user. In the context as 
the number of CDF collaborators is decreasing, WHAM is very helpful in last years of CDF 
data taking and data analyzing because one postdoctoral student will be able to update 
very easily all these analyses with new data. 

C.3 Implementation 

The WHAM code is contained in various folders with suggestive names and where code with 
very specific goal is placed: setup, inputs, commands, modules and results. 

The folder "setup" allows for the easy setup of the entire analysis framework. 

The folder "inputs" contains all the inputs needed for the analyses: 

- Lists of files with events to be processed, either data or Monte Carlo signal or back- 
ground simulated events; 

- Lists of data runs with good quality data; 

- Text files with information needed for the analysis such as analysis cuts, tasks to be 
performed, luminosities and data-simulation scale factors. This allows to change input 
parameters in the analysis without recompiling the code. Also, these input parameters 
are always saved with the results and they can be retrieved if in doubt on the input 
parameters of a given result. 

The folder "modules" contains in a modular way most of the code that performs tasks in 
WHAM: 
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- The folder "dep" contains the dependencies files with extension .o and .d produced 
during the compilation of code; 

- The folder "shlib" contains the shared objects with extension .so produced during the 
compilation of code 

- The folder " inc" is the include directory that contains symbolic links to all the packages 
inside the "modules" folder; 

- The folder "external" contains all the packages already existing in the CDF software 
archive that WHAM uses, such as high level object reconstruction code, ^-tagging al- 
gorithm and their mistag matrices, background calculation methods, limit calculation 
method, manipulating .root files, boos{l NKRoo1§ and ABCDI-d 

The folder "native" contains all the packages produced only for WHAM, such as event 
reconstruction, various computations, such as the discriminant output on an event by event 
basis, making analysis trees from the original data or simulated events from cdf production 
group, making histogram root files that are used for background estimation. 

The folder "commands" contains various tasks such as submitting computing jobs to the 
CDF's Central Analysis Farm (CAF), the limit calculation code, various data analysis ROOT 
macros and various scripts that read log files and compute things with that information or 
merge existing root files. 

The folder "results" contains all the results of tasks from "modules". Here we store the 
ROOT trees produced with WHAM using CAF that are used then for background estimation 
and limit calculation, the limit results, text files with enumerations of events that pass 
our event selection and their kinematic properties (event dumps), histogram files used for 
background calculation, various smaller trees for various studies such as signal acceptance 
improvement or jet energy resolution improvement. 

C.4 Future Plans 

At the moment that this thesis is submitted, only the WHNN analysis has been performed 
and validated using the WHAM framework. Other analyses are in progress, such as single 
top measurements and technicolour and WZ searches. 



Boost provides free peer-reviewed portable CH — h source libraries at |http : //www . boost . org 



2 NKRoot is a collection of tools for ROOT file manipulation needed for particle physics data analysis, 
developed by Nils Krumnack while a postdoc on CDF and still being developed as he is a researcher on 
ATLAS. The package is available for free to anyone at http://www-cdf.fnal.gov/~nils/root/ 

3 ABCDF is the software package I developed in order to model the trigger simulation for missing energy 
+ jets triggers and it is placed in the external package as it can be used by other CDF frameworks and 
analyses as well. 
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Appendix D 

Control Plots 



This appendix presents a selection of relevant plots for our analysis. In each plot, all 
backgrounds, properly normalized, are stacked. Data points are overlaid in order to show 
the good modelling of the backgrounds. The two signal processes are also overlaid after 
having been multiplied by a factor in order to be visible on the plots. Only the most 
sensitive analysis channel (TIGHT SVTSVT) is shown in order not to make the thesis too 
long. Since the Pretag category is a control sample for each analysis channel, also the 
TIGHT Pretag category is shown. 

For each category, a collection of plots relevant to the kinematic distribution of the event 
is shown, such as the transverse energy, 77 and </> of the two jets and the charged lepton, as 
well as the the transverse mass of the W boson, the A(f> between the Jfe and each jet 
and the charged lepton, followed by the AR between the two jets. Also for each category 
a collection of plots showing the variables used as input to the BNN final discriminant, as 
well as the BNN output for a Higgs mass of 115 GeV/c 2 , are shown. 
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Figure D.l: First part of the control plots for TIGHT charged lepton Pretag kinematic 
variables. 



168 



Central Lepton 
Pretag 

S8000F 



CDF Run II Preliminary (5.7 fb" 1 ) 




°0 10 20 30 40 50 60 70 80 90 100 
Lepton Pt (GeV/c) 



Central Lepton 
Pretag 



S5000 



CDF Run II Preliminary (5.7 fb" 1 ) 



|4000 



3000 



2000 = 



1000 



°o- 



12 3 4 5 6 
Lepton <( 

Central Lepton 

Pretag CDF Run II Preliminary (5.7 fb" 1 ) 



£5000 



£4000 



3000 



2000 



1000 




°o- 



Central Lepton 
Pretag 



£8000 



CDF Run II Preliminary (5.7 fb" 1 ) 



07000 
S 

E6000 
z 

5000 
4000 
3000 
2000 
1000 





Central Lepton 

Pretag 

20000 
c 

$8000 
£6000 

01 

£4000 

3 

z 

12000 
10000 

8000 

6000 

4000 

2000 

°0 



-2-10123 
Lepton r| 

CDF Run II Preliminary (5.7 fb" 1 ) 




20 

Central Lepton 
Pretag 



40 60 80 100 120 
Missing Transverse Energy (GeV) 

CDF Run II Preliminary (5.7 fb" 1 ) 



1 2 3 4 5 6 
MET* 




60 80 100 120 140 
Transverse Mass (GeV/c 2 ) 



Figure D.2: Second part of the control plots for TIGHT charged lepton Pretag kinematic 
variables. 
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Figure D.3: Third part of the control plots for TIGHT charged lepton Pretag kinematic 
variables. 
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Figure D.4: First part of the control plots for TIGHT charged lepton SVTSVT kinematic 
variables. 
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Figure D.5: Second part of the control plots for TIGHT charged lepton SVTSVT kinematic 
variables. 
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Figure D.6: Third part of the control plots for TIGHT charged lepton SVTSVT kinematic 
variables. 
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Figure D.7: First half of the control plots for TIGHT charged lepton SVTSVT BNN Input 
Variables. 
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Figure D.8: Second half of the control plots for TIGHT charged lepton SVTSVT BNN Input 
and the BNN Output Variable for the Higgs mass of 115 GeV/c 2 . 
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Appendix E 



Glossary 

• 6-tagging - The process of identifying if a jet originates in a bottom quark 

• ABCDF - The software package I wrote for trigger combination 

• ALPGEN - Monte Carlo event generator 

• ANN - Artificial Neural Network 

• BNN - The final analysis discriminant, the output of an artificial neural network 

• BSM - Theories beyond the Standard Model theory 

• CDF - Collider Detector at Fermilab 

• CEM - Central Electromagnetic Calorimeter 

• CES - Central Electromagnetic Shower Maximum Detector 

• CHA - Central Hadronic Calorimeter 

• CL - Confidence Level (in the frequentist approach) and Credibility Level (in the 
Bayesian approach) 

• CLC - Cherenkov Luminosity Counter 

• CMP - Central Muon uPgrade Detector 

• CMU - Central Muon Detector 

• CMUP - Muon candidates with hits both in the CMU and CMP detectors 

• CMX - Central Muon extension Detector 

• COT - Central Outer Tracker, the drift chamber used for tracking 

• CSL - Consumer Server/Logger 

• DiTop - The background sample of top quark pair production 

• FNAL - Fermilab National Accelerator Laboratory 
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• FSR - Final State Radiation 

• ISL - Intermediate Silicon Layers, the third subdetector of the silicon detector 

• ISOTRK - The central loose charged lepton category (mainly loose muon candidates) 

• ISR - Initial State Radiation 

• JES - Jet Energy Scale, one of the most important sources of systematic uncertainty 
for this analysis, as well as the only shape systematics 

• JetProb - Jet Probability 6-tagging algorithm 

• LOO - Layer 00, the first subdetector of the silicon detector 

• LI - The first trigger level 

• L2 - The second trigger level 

• L3 - The third trigger level 

• LHC - The Large Hadron Collider 

• MADEVENT - Monte Carlo event generator 

• MC - Monte Carlo simulation 

• MET - Missing transverse energy 

• MET2J - The first of the MET-based triggers 

• MET45 - The second of the MET-based triggers 

• METDI - The third of the MET-based triggers 

• Obs - The number of data events observed 

• PDF - Parton Distribution Function 

• PEM - Plug Electromagnetic Calorimeter 

• PES - Plug Electromagnetic Shower Maximum Detector 

• PHA - Plug Hadronic Calorimeter 

• PMT - Photomultiplier Tube 

• Pretag - The sample of events that pass all the event selection requirements, before 
any 6-tagging requirement is applied 

• PYTHIA - Monte Carlo event generator and parton showering program 

• QCD - The background sample of pure QCD production faking a W boson production 

• SecVtx - Secondary Vertex 6-tagging algorithm 

• SM - The Standard Model of elementary particles and their interactions 
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• STopS - The background sample of single top production in the s channel 

• STopT - The background sample of single top production in the t channel 

• SUSY - The principle of supersymmetry, which lead to several theories beyond the 
Standard Model 

• SVT - Secondary vertex reconstruction at the L2 trigger level 

• SVTJP05 - One 6-tagging category, where one jet is tagged by the Secondary Vertex 
algorithm and the other jet by the Jet Probability algorithm 

• SVTnoJP05 - One ^-tagging category, where one jet is tagged by the Secondary 
Vertex algorithm and the other jet is not tagged by the Jet Probability algorithm 

• SVTSVT - One 6-tagging category, where both jets are tagged by the Secondary 
Vertex algorithm 

• SVX-II - Silicon Vertex Detector, the second subdetector of the silicon detector 

• Technicolor - A family of theories beyond the Standard Model 

• TIGHT - The central tight charged lepton categories (central electrons and central 
muons) 

• TOF - Time of Flight 

• Wbb - The background sample of W boson + bb 

• Wcc - The background sample of W boson + cc and W boson + cj 

• WH - The main signal process for the Higgs boson search described in this thesis 

• WH115 - The signal sample of WH associated production when the Higgs boson has 
a mass of 115 GeV/c 2 

• WHA - Wall Hadronic Calorimeter 

• WHAM - WH Analysis Modules, the data analysis framework of which I am coauthor 

• Wlf - The background sample of W boson + light flavour jets incorrectly tagged as 
heavy flavour (mistags) 

• WW - The background sample of WW 

• WZ - The background sample of WZ 

• XFT - eXtremely Fast Tracker - track reconstruction at LI trigger level 

• ZH - The second signal process for the Higgs boson search described in this thesis 

• ZH115 - The signal sample of ZH associated production when the Higgs boson has 
a mass of 115 GeV/c 2 

• Zjets - The background sample of Z boson + jets 

• ZZ - The background sample of ZZ 
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